(Attention: The calculations and analysis are not biased by my political views – but the interpretation of the results might be and their verbal formulation certainly is … ;)
About a week ago I came across an article titled “How divided is the Senate?” by Vik Paruchuri where he uses a method called principal component analysis (PCA) to visualize the closeness of votings given by senators of the 113th Congress of the USA. I immediately fell in love with the idea behind this article as well as the method applied – which was a great opportunity to revise some statistics and alebra basics. And because (pretending) transparency is a major foundation of a modern democracy, full detailed word by word protocols of every meeting of the Bundestag are published as PDFs and text files on their website. So I downloaded all those protocols for the 17th Bundestag, extracted the votings and loaded the votes into a data frame. That was quite a drag because judging from typos (Sevim Dadelen, Sevim Dagelen, Sevim Dagdelen, …), different name versions (Erwin Josef Rüddel, Erwin Rüddel) and line breaks within the longer names like Dr. Karl-Theodor Freiherr von und zu Guttenberg (his title is gone, so the name became a tad handier by now) those text files where manually sanitized PDF convertions of live transcripts. I’ll spare you the details – but getting the data finally right took quite some effort.
The technical details
But in the end I won this battle and my data set kept more than 145’000 records of all the votes given by 651 delegates in 254 votings. A “Yes” I registered as 1, a “No” as -1 and “I don’t care – where is my money?” as 0. For the PCA plot I assumed 0 if a delegate didn’t take part in the voting. For this visualization I only considered delegates who took part in at least 230 votings – otherwise the outliers would be mostly individuals who rarely joined votings – some of them because they died, resigned or stepped in. This last group of delegates is also not considered in the boxplot on participations for obvious reasons. The boxplot on average deviation from the party’s opinion only considers delegates with at least 100 votes – to have something to statistically chew on. In that case by the way absence of votings are just not used in the calculation – so no assumption of 0 then like for the PCA.
What is PCA actually?
When you want to visualize the distribution of records keeping two or three attributes – say age, height and weight measured for a set of individuals – then you can just represent one vector of observations in two or three dimensions. No problem. But in this use case we are facing not two or three but 254 dimensions! Now the great thing is all those points are usually not evenly or hyper-spherically distributed but forming a shape that can be represented “good enough” with less dimensions. PCA is a mathematical trick to identify a sub-space of less dimensions that neatly wraps this hyper-cloud of points. If I made you curious I highly recommend this paper by Lindsay I. Smith. Using this source I programmed the calculations in R for curiosity myself. I am still fascinated that one can do something like this at all.
Votes deviating from the party opinion
(The party’s opinion on a voting is simply the rounded average of all vote values given by members of that party. And the “average difference of a delegate” is the average of the sum of all absolute differences between an individuals vote and the party’s opinion.)
CDU/CSU and FDP are indistinguishable in the PCA plot. That’s because they are forming a coaltion. They are also showing the strongest flock of sheeps – all voting the same – as you can see on the box plot on the right hand side (most delegates are very close to a 0 score). This can be explained with the fact that this coalition forms the majority and hence is likely to win any voting – given everybody acts in concert. So comparatively rarely a delegate is willing to risk f****** up a voting – by expressing what simple people refer to as their “opinion”.
The same reasoning applies to Grüne and SPD – just inversed. Because they are forming a minority a lost voting won’t usually be blamed on an opposing party member – so they can express is with less risk.
Given the comparatively high median for deviating delegates from Die Linke with a small span between 1st and 3rd quartile might indicate that this is the party of troublemakers who tend to oppose on high and quite homogenous level on a regular basis.
Participation in votings
The above formulated reasoning for voting discipline connected with forming a minority or a majority can also be observed in the boxplot on participations by delegate and party. Because as you can see CDU/CSU and FDP show up the most regularly compared to the other parties. Die Linke on the other hand which owns the least number of seats and is also in the PCA plot showing up distant from the other parties takes it the easiest. They have little chances of deciding the outcome of a voting. Nonetheless as a data analyst interested especially in detailed governmental data I have to praise them for asking very interesting questions in the Bundestag (this map on secretive transports of radioactive waste across Germany became public thanks only to Die Linke!).
The PCA plot
The euclidian distances depicted in the PCA plot (check out the 3D model if you didn’t yet) match the perceived political differences between the parties surprisingly well. FDP is of course staying as close as possible with its bigger brother, but apart from this triviality it makes totally sense that the closest two parties are SPD and Die Grünen, the most distant are Die Linke and CDU/CSU and that SPD and Die Grünen are closer to Die Linke than to CDU/CSU. To find this pattern in the plot was the best I could hope for.
And the winner of this term’s award for open-minded thinking is …
… Hans-Christian Ströbele (Bündnis 90 / Die Grünen) who shows so far the strongest inclination to make use of his neurological facilities (called “thinking”) while taking place in 249 of 254 votings – that’s a top score in both regards and shows that he is also taking his responsiblity as a politician working for the German society very serious! And as an official member of this society I say:
Thank you very much, Mr. Ströbele!
Here is the top ten of deviators
Peter Gauweiler is an extreme case regarding the average difference from party opinion as well as number of participations. I find it odd that he just participated in a third of all votings so far. The wikipedia article mentions that this is a known phenomena already.
Considering the reasoning for fewer troublemakers in the parties forming the majority Manfred Kolbe sticks out as a comparatively strong counter example. Because he also apparently takes his responsibility serious I grant him the runner-up slot and also say to him: Thank you, Herr Kolbe!