Humor is a powerful, alternative Method for processing Data and reporting Results.

-

“Je n’ai pas peur des représailles. Je n’ai pas de gosses, pas de femme, pas de voiture, pas de crédit. Ça fait sûrement un peu pompeux, mais je préfère mourir debout que vivre à genoux.”

(“I am not afraid of reprisals, I have no children, no wife, no car, no debt. It might sound a bit pompous, but I’d prefer to die on my feet rather than living on my knees.”)

Charb – Interview 2012

Hierarchical Clustering with R (feat. D3.js and Shiny)

hclust-shinyAgglomerative hierarchical clustering is a simple, intuitive and well-understood method for clustering data points. I used it with good results in a project to estimate the true geographical position of objects based on measured estimates. With this tutorial I would like to describe the basics of this method, how to implement it in R with hclust and some ideas on how to decide where to cut the tree. This was also a great opportunity for composing anohter Shiny/D3.js app (GitHub for the code, shinyapps.io for the app) – something I wanted to do for a while now. At the end of the text I am writing a bit about what I learned in that regard.

Continue reading

MongoDB – State of the R

mongodbNaturally there are two reasons for why you need to access MongoDB from R:

  1. MongoDB is already used for whatever reason and you want to analyze the data stored therein
  2. You decide you want store your data in MongoDB instead of using native R technology like data.table or data.frame

In-memory data storage like data.table is very fast especially for numerical data, provided the data actually fits into your RAM – but even then MongoDB comes along with a bag of goodies making it a tempting choice for a number of use cases:

  • Flexible schema-less data structures
  • spatial and textual indexing
  • spatial queries
  • persistence of data
  • easily accessible from other languages and systems

In case you would like to learn more about MongoDB then I have good news for you – MongoDB Inc. provides a number of very well made online courses catering to various languages. An overview you may find here.

Continue reading

Twitter’s REST API v1.1 with R (for Linux and Windows)

twitterIn this tutorial I am going to describe a straightforward way of how to make use of Twitter’s REST API v1.1. For that purpose I composed a little package (RTwitterAPI), so that requesting data just needs the API URL, the API parameters and a vector containing the OAuth parameters.

Before you can get started you have to login to your Twitter account on dev.twitter.comcreate an application and generate an “Access Token” for it. So let’s jump right in and fetch IDs of 10 followers of @hrw (Human Rights Watch). The necessary code is located on GitHub as a package named RTwitterAPI which may be installed using devtools::install_github().

Continue reading

Reasonable Inheritance of Cluster Identities in Repetitive Clustering

… or Inferring Identity from Observations

cluster-identityLet’s assume the following application:

A conservation organisation starts a project to geographically catalogue the remaining representatives of an endangered plant species. For that purpose hikers are encouraged to communicate the location of the plant if they encounter it. Due to those hikers using GPS technology ranging from cheap smartphones to highend GPS devices and weather as well as environmental circumstances the measurements are of varying accuracy. The goal of the conservation organisation is to build up a map locating all found plants with an ID assigned to them. Now every time a new location measurement is entered into the system a clustering is applied to identify related measurements – i.e. belonging to the same plant.

Continue reading

Interactive Heatmaps with Google Maps API v3

indiaThanks to the Google Maps API it is pretty easy to code up a small JavaScript to turn a bunch of points into an interactively explorable and lovely looking heatmap. You’re welcome to give it a try on heatmap.joyofdata.de where you can load a CSV to display its contained points. The CSV is supposed to be semicolon delimited and contain at least two columns “lat” and “lon” for the geographical location and an optional third numerical column “weight”. The order does not matter. And of course the parsing is done with Papa Parse – what else!

Continue reading

Free and Certified MongoDB Online Courses

MongoDB_University_LogoIn case you are interested in learning about MongoDB or generally curious about non-relational approaches to storage of data then my recommendation for you is to check out the online courses offered by MongoDB Incorporation. I promise you won’t be disappointed. MongoDB Inc’s educational department – MongoDB University – offers five courses for developers and dev ops:

Continue reading

Transforming an XML Document into a CSV using XMLStarlet

In this little tutorial I am going to describe a handy tool for transforming an XML document into a more easily processable CSV format. There are many ways of getting this job done – but most are more tedious than necessary (like writing a custom made RegEx parser – yuck!). Using XMLStarlet and XPath expressions this is going to be cinch. Let’s evaluate a number of typical XML data configurations and turn them into a flat CSV structure.

Continue reading