The top row keeps the maps and the bottom row the respective magnifying areas. On the right hand side you will find explanations on how to use this tool and further explanatory links. The combination of placing red and green mark lines aiming at a field at a high zoom level allows you to quickly locate the plot you are looking for.
(By the way – if you want to know how I created those plots in R then check this article out. In that article I describe the different aspects from fetching the data from MySQL to saving he plots in a PNG.)
Originally I planned to cover over 100 stocks (DAX, MDAX and Dow Jones). But this led to a PNG sized 18’000 x 18’000 pixels of 20MB keeping the scatter plots. I knew this would be tough for Chrome to handle but what I didn’t expect is that he (or she – or it?) refuses to show the PNG at all. It seems that Chrome has an inbuilt restriction regarding a maximum size. Anyway, even for smaller scatter plot sets the panning and zooming got so slow and edgy that it wasn’t fun to use. So I reduced the the number of coverd stocks down to 30 German blue chips listed in DAX and MDAX (traded on XETRA). Also the whole project focus gradually moved from examining the correlations to the technical aspects of how to realize a tool that makes this exploration possible at all. Luckily I found a jQuery based zoomer (Featured Image Zoomer v2.1) that just needed few adjustments to realize such a tool. OpenLayers and Google Maps are also promising frameworks – basically the big plot matrix is nothing else than a map.
A down-to-earth zoology of correlations:
The used colorations in the scatterplot for the different years:
Linde (x) – Gea Group (y) / pearson > 0.75
If you look at the different colors then it seems like they are each quite well arranged on slanted straight lines. This would be an example for a correlation that is changing roughly on a yearly basis but is quite high within those time frames.
Celesio (x) – Merck (y) / pearson almost 0.5
The correlation coefficient is low but when we look at the coloring then it seems like the quotes developed parallely from about 2003 to 2007. Earlier than 2003 no shape is observable and after 2007 apparently quite chaotic (orange color) times dawned.
e.on (x) – RWE (y) / pearson close to 1
Quite an up and down but the quotes move jointly. Which is not a coincidence as these companies are the biggest energy providers in Germany. 2008 (orange) apparantly was pretty turbulent here as well – judging from the insulated spots which indicate large quote to quote steps.
Commerzbank (x) – MAN (y) / pearson about -0.2
Especially intersting I find scatterplots displaying a curve shape – sometimes even similar to a circle. The minium and maximum quote in a certain time span and the different quotes in between are leading to an area of similar colored dots with the spread as the diameter parallel to the axis of the respective stock. So, no or a small correlation for such a time span will lead to a more or less circular area of dots. But if the spread and the average quote for a time span is changing in (casual or random) coordination with the spread and average of the other stock, then you will see those curves – in other words: for two stocks A and B the spread in a time span T has a complementary relation – maybe like spread(A,T) = 1-spread(B,T).
The conclusion would be that there are indead co-relations but they are delayed. That might be an interpretation.
Further explorations of this topic
This whole examination can be taken much further. I would like to solve this quantitative restriction. I would like to be able to investigate larger plot matrices in such a fashion. Also the types of visualization can be varied and optimized in may ways. I am not entirly happy with the plots – f.x. the mirrored arrangement of the density plot is a bit annoying.
As mentioned in the previous section it would be interesting to have a look at correlations for quotes shifted in time – to unveil delayed relations. I guess sooner or later I will take this further.