Why (a) PCA?

3D model of ellipsoid and its three principal components

A principal component analysis is a way to reduce dimensionality of a data set consisting of numeric vectors to a lower dimensionality. Then it is possible to visualize the data set in three or less dimensions. Have a look at this use case. I’ll try to explain the motivation using a simple example.

Think of a very flat square (e.g. 1x1x.05) in a three dimensional space. What you see of this cuboid when looking at it from a specific angle while assuming perspectivic visual indicators away is basically a 2 dimensional projection of it. From one angle it looks like a rhombus, looking straight from the edge side all you see is a line-like shape. So obviously the representation offering the most information about this cuboid is the one you see when looking at it perpendicular to the main surface. Continue reading

R code for igraph animation

For my personal code archive and everybody who finds interest in it I publish the R code which I used to create the frames for the animations showing the carpoolings formed through the booking system until a certain date. The graphs are created using igraph and plotted into frames which are later glued into an MPEG clip using ffmpeg.

Guidance for the code

The graph and its growth is all contained in one CSV file keeping three columns. A driver’s ID, a passenger’s ID and the the date the passenger took a ride with the driver – according to our booking system – this list defines the possible edges.

ggplot2 basics in action

ggplot2 is for plotting in R, very flexible and ably designed by Hadley Wickham following a concept called “grammar of graphics” and anyway pretty awesome – so let’s jump right in with some simple examples that should help you get it going.

Make R(ODBC) talk to MySQL on Windows 7 64bit

When you are dealing with large amounts of big data sets it is much more efficient to organizes those in database tables instead of CVSs or other files. Just yesterday I set up R for fetching data from a MySQL DBMS loading a table of stock quotes consisting of more than 300’000 rows into a data frame within seconds. That is pretty cool – and if necessary you can join huge tables in no time benefting of the indexing infrastructure of the DBMS of your choice.

Converting a 4-dimensional wide table into long format

In my last article about converting a wide table into a long table using reshape’s melt function (recommend reading it first), I promised to soon cover the 4-dimensional case – here you go.  Originally I was faced with this problem when checking out the official statistics on death causes in Germany. The problem is that you cannot apply the pivoting tools of spreadsheet programs like Excel or Calc to a (wide) cross table. Also other tools like reshape’s cast function expect a long structured data table.

Why do we want to transform a table from wide to long?

In the article published yesterday I explained how to fetch statistics from GENESIS using the statistics on death causes as an example. After downloading all the data and glueing the tables together you are finally left with one huge monster table.