“Je n’ai pas peur des représailles. Je n’ai pas de gosses, pas de femme, pas de voiture, pas de crédit. Ça fait sûrement un peu pompeux, mais je préfère mourir debout que vivre à genoux.”
(“I am not afraid of reprisals, I have no children, no wife, no car, no debt. It might sound a bit pompous, but I’d prefer to die on my feet rather than living on my knees.”)
Charb – Interview 2012
Wow – what a headline … okay, I admit it’s phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring … but I guess I just get more proactive at marketing with every post I publish!
Agglomerative hierarchical clustering is a simple, intuitive and well-understood method for clustering data points. I used it with good results in a project to estimate the true geographical position of objects based on measured estimates. With this tutorial I would like to describe the basics of this method, how to implement it in R with hclust and some ideas on how to decide where to cut the tree. This was also a great opportunity for composing anohter Shiny/D3.js app (GitHub for the code, shinyapps.io for the app) – something I wanted to do for a while now. At the end of the text I am writing a bit about what I learned in that regard.
Naturally there are two reasons for why you need to access MongoDB from R:
- MongoDB is already used for whatever reason and you want to analyze the data stored therein
- You decide you want store your data in MongoDB instead of using native R technology like data.table or data.frame
In-memory data storage like data.table is very fast especially for numerical data, provided the data actually fits into your RAM – but even then MongoDB comes along with a bag of goodies making it a tempting choice for a number of use cases:
- Flexible schema-less data structures
- spatial and textual indexing
- spatial queries
- persistence of data
- easily accessible from other languages and systems
In case you would like to learn more about MongoDB then I have good news for you – MongoDB Inc. provides a number of very well made online courses catering to various languages. An overview you may find here.
In this tutorial I am going to describe a straightforward way of how to make use of Twitter’s REST API v1.1. For that purpose I composed a little package (RTwitterAPI), so that requesting data just needs the API URL, the API parameters and a vector containing the OAuth parameters.
Before you can get started you have to login to your Twitter account on dev.twitter.com, create an application and generate an “Access Token” for it. So let’s jump right in and fetch IDs of 10 followers of @hrw (Human Rights Watch). The necessary code is located on GitHub as a package named RTwitterAPI which may be installed using devtools::install_github().
… or Inferring Identity from Observations
A conservation organisation starts a project to geographically catalogue the remaining representatives of an endangered plant species. For that purpose hikers are encouraged to communicate the location of the plant if they encounter it. Due to those hikers using GPS technology ranging from cheap smartphones to highend GPS devices and weather as well as environmental circumstances the measurements are of varying accuracy. The goal of the conservation organisation is to build up a map locating all found plants with an ID assigned to them. Now every time a new location measurement is entered into the system a clustering is applied to identify related measurements – i.e. belonging to the same plant.
In case you are interested in learning about MongoDB or generally curious about non-relational approaches to storage of data then my recommendation for you is to check out the online courses offered by MongoDB Incorporation. I promise you won’t be disappointed. MongoDB Inc’s educational department – MongoDB University – offers five courses for developers and dev ops:
- MongoDB for Python Devs (next session Sep 9 2014)
- MongoDB for Java Devs (next session Aug 5 2014)
- MongoDB for node.js Devs (next session Aug 12 2014)
- MongoDB for DBAs (next session Jul 15 2014)
- MongoDB Advanced Deployment and Operations (next session Jul 15 2014)
In this little tutorial I am going to describe a handy tool for transforming an XML document into a more easily processable CSV format. There are many ways of getting this job done – but most are more tedious than necessary (like writing a custom made RegEx parser – yuck!). Using XMLStarlet and XPath expressions this is going to be cinch. Let’s evaluate a number of typical XML data configurations and turn them into a flat CSV structure.