Hierarchical Clustering with R (feat. D3.js and Shiny)

hclust-shinyAgglomerative hierarchical clustering is a simple, intuitive and well-understood method for clustering data points. I used it with good results in a project to estimate the true geographical position of objects based on measured estimates. With this tutorial I would like to describe the basics of this method, how to implement it in R with hclust and some ideas on how to decide where to cut the tree. This was also a great opportunity for composing anohter Shiny/D3.js app (GitHub for the code, shinyapps.io for the app) Рsomething I wanted to do for a while now. At the end of the text I am writing a bit about what I learned in that regard.

Continue reading

Pivoting Data in R Excel-style

(This article is referring to an initial proof-of-concept version of r-big-pivot)

I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set’s structure and for discovering interesting anomalies (the sudden rise of deaths due to viral hepatitis serves as a nice example). Unfortunately Excel itself is handicapped by several restrictions:

  • a maximum number of one point something million rows per data set (which is crucial because the data needs to be formatted long)
  • Excel is slow and often instable when you go beyond 100’000 rows
  • cumbersome integration into available data infrastructures
  • restricted charting and analysis capabilities

Continue reading