Early in December 2013 a lawfirm began to send out approximately 10 to 40 thousand cease-and-desist letters on behalf of the rightholder of a bunch of porn flicks for streaming those films on redtube. So far, so good. Now a lot of people didn’t like to receive bills ranging from 250 to more than a thousand Euro for streaming erotica just before christmas especially when being pretty sure that they didn’t even do so. Now given the magnitude of this case a lot of these people turned sour and started to dig a bit deeper. And what was brought to light is a shady network of companies with links where there should be none and a bunch of business partners who as well turned out to have more in common than what was to be seen at first glance.
(Attention: The calculations and analysis are not biased by my political views – but the interpretation of the results might be and their verbal formulation certainly is … ;)
About a week ago I came across an article titled “How divided is the Senate?” by Vik Paruchuri where he uses a method called principal component analysis (PCA) to visualize the closeness of votings given by senators of the 113th Congress of the USA. I immediately fell in love with the idea behind this article as well as the method applied – which was a great opportunity to revise some statistics and alebra basics. And because (pretending) transparency is a major foundation of a modern democracy, full detailed word by word protocols of every meeting of the Bundestag are published as PDFs and text files on their website. So I downloaded all those protocols for the 17th Bundestag, extracted the votings and loaded the votes into a data frame. That was quite a drag because judging from typos (Sevim Dadelen, Sevim Dagelen, Sevim Dagdelen, …), different name versions (Erwin Josef Rüddel, Erwin Rüddel) and line breaks within the longer names like Dr. Karl-Theodor Freiherr von und zu Guttenberg (his title is gone, so the name became a tad handier by now) those text files where manually sanitized PDF convertions of live transcripts. I’ll spare you the details – but getting the data finally right took quite some effort.
My girlfriend and me just arrived back from an awesome and very sunny two weeks journey to Israel. We spent most of the time in Haifa where we stayed with our friend Shai but of course we also jaunted (first time ever I use this verb) to Eilat, Tel Aviv and Jerusalem. In Jerusalem the major highlight is the old city – a or the center for the jewish, christian and muslim religion. It’s not that large but packed with historical places – so after entering the area we checked out a map hanging next to the gate and of course first thing I did was to pinpoint the place where we where (labelled “you are here”) and Anni pointed out to me that obviously I am not the first person doing that because the color was rubbed off already. This phenomena struck me as quite interesting so I wanted to share it on here. Actually I have still no good idea how to name this or maybe there is a name for that already? You’re welcome to help me out.
Usual administrative units are too heterogenous for regional statistics. To make regions comparable, territorial units of similar population size are required. For the European Union and further states being associated in some way or another the NUTS (Nomenclature des unités territoriales statistiques) classification has been developed in 1980 and is being updated triennially.
There are four NUTS levels 0,1,2 and 3. Every region is designated a code consisting of two to five characters. The first two characters denote the state (the usual ISO-3166 two letter code – Greece being an exception as it is referred to with EL instead of GR). The characters following it in case of NUTS 1,2 and 3 form a hierachical system. So for example DE21H (Munich) belongs to DE21 (Oberbayern) belongs DE2 (Bayern / Bavaria) belongs to DE (Germany).
As you might know, I am working as the Data Analyst for carpooling.com in Munich. carpooling.com is the company maintaining the leading web platform for organizing carpoolings (in the world, actually). Many people don’t know what “carpooling” means, so let me explain it you:
Tanja lives in Stuttgart and wants to visit her family in Hamburg next weekend. This is quite a long distance and hence pretty expensive – and also kind of dull sitting in a car alone for several hours. Having three free seats in her car left, she thinks to herself … “Why not offer those seats to other people and share the expenses with them?!”. So she advertises her planned lift on www.mitfahrgelegenheit.de – the biggest German web-site for carpoolings. Peter who also wants to travel to Hamburg next weekend finds her ad and gives her a call to seal the deal. Okay, so far so good, but few passengers aren’t as reliable as Peter and might just forget about the ride and Tanja would then be left with a free seat and no money. So carpooling.com came up with the idea of a “booking system” to make carpooling agreements more binding. Next weekend Tanja, Peter and further passengers meet and drive to Hamburg together.
I was curious how gender-ratios of young women and men are distribute geographically in Europe. Eurostat offers absolute numbers for all NUTS2 regions in Europe. The most recent available figures were referring to January 2012 – in few cases like Turkey I was falling back to January 2011 due to missing values.
The figures are drawn from table “demo_r_d2jan” on Eurostat.
The scatterplot shows the frequency of occuring words for two sets of texts. You click on one circle and you see the words for it on the left hand side. The app is built on d3.js (my second small project using it) and I am planning to write an introductory article on it soon. Apart from a few issues it is fun to work with d3.
Originally I had the idea for this little project (still can’t find a name or description for it) when dealing with the stock quotes correlations. The tool I came up with shows the scatterplot for two stock quotes charts and the respective Pearson correlation coefficient. I wanted to see if one can tell from the scatterplot and the coefficient how two stocks relate to each other. I didn’t take this investigation much further than the visualization and some pondering about patterns shown in the scatterplots.