Visualization of voting behaviour in the 17th German Bundestag


Click to get to the interactive 3D scatter plot with labels (PCA plot)

(Attention: The calculations and analysis are not biased by my political views – but the interpretation of the results might be and their verbal formulation certainly is … ;)

About a week ago I came across an article titled “How divided is the Senate?” by Vik Paruchuri where he uses a method called principal component analysis (PCA) to visualize the closeness of votings given by senators of the 113th Congress of the USA. I immediately fell in love with the idea behind this article as well as the method applied – which was a great opportunity to revise some statistics and alebra basics. And because (pretending) transparency is a major foundation of a modern democracy, full detailed word by word protocols of every meeting of the Bundestag are published as PDFs and text files on their website. So I downloaded all those protocols for the 17th Bundestag, extracted the votings and loaded the votes into a data frame. That was quite a drag because judging from typos (Sevim Dadelen, Sevim Dagelen, Sevim Dagdelen, …), different name versions (Erwin Josef Rüddel, Erwin Rüddel) and line breaks within the longer names like Dr. Karl-Theodor Freiherr von und zu Guttenberg (his title is gone, so the name became a tad handier by now) those text files where manually sanitized PDF convertions of live transcripts. I’ll spare you the details – but getting the data finally right took quite some effort.

The technical details


But in the end I won this battle and my data set kept more than 145’000 records of all the votes given by 651 delegates in 254 votings. A “Yes” I registered as 1, a “No” as -1 and “I don’t care – where is my money?” as 0. For the PCA plot I assumed 0 if a delegate didn’t take part in the voting. For this visualization I only considered delegates who took part in at least 230 votings – otherwise the outliers would be mostly individuals who rarely joined votings – some of them because they died, resigned or stepped in. This last group of delegates is also not considered in the boxplot on participations for obvious reasons. The boxplot on average deviation from the party’s opinion only considers delegates with at least 100 votes – to have something to statistically chew on. In that case by the way absence of votings are just not used in the calculation – so no assumption of 0 then like for the PCA.

What is PCA actually?

When you want to visualize the distribution of records keeping two or three attributes – say age, height and weight measured for a set of individuals – then you can just represent one vector of observations in two or three dimensions. No problem. But in this use case we are facing not two or three but 254 dimensions! Now the great thing is all those points are usually not evenly or hyper-spherically distributed but forming a shape that can be represented “good enough” with less dimensions. PCA is a mathematical trick to identify a sub-space of less dimensions that neatly wraps this hyper-cloud of points. If I made you curious I highly recommend this paper by Lindsay I. Smith. Using this source I programmed the calculations in R for curiosity myself. I am still fascinated that one can do something like this at all.

Votes deviating from the party opinion

(The party’s opinion on a voting is simply the rounded average of all vote values given by members of that party. And the “average difference of a delegate” is the average of the sum of all absolute differences between an individuals vote and the party’s opinion.)


Boxplot on delegates deviating from the party opinion

CDU/CSU and FDP are indistinguishable in the PCA plot. That’s because they are forming a coaltion. They are also showing the strongest flock of sheeps – all voting the same – as you can see on the box plot on the right hand side (most delegates are very close to a 0 score). This can be explained with the fact that this coalition forms the majority and hence is likely to win any voting – given everybody acts in concert. So comparatively rarely a delegate is willing to risk f****** up a voting – by expressing what simple people refer to as their “opinion”.

The same reasoning applies to Grüne and SPD – just inversed. Because they are forming a minority a lost voting won’t usually be blamed on an opposing party member – so they can express is with less risk.

Given the comparatively high median for deviating delegates from Die Linke with a small span between 1st and 3rd quartile might indicate that this is the party of troublemakers who tend to oppose on high and quite homogenous level on a regular basis.

Participation in votings


Boxplot on participation in votings by delegates

The above formulated reasoning for voting discipline connected with forming a minority or a majority can also be observed in the boxplot on participations by delegate and party. Because as you can see CDU/CSU and FDP show up the most regularly compared to the other parties. Die Linke on the other hand which owns the least number of seats and is also in the PCA plot showing up distant from the other parties takes it the easiest. They have little chances of deciding the outcome of a voting. Nonetheless as a data analyst interested especially in detailed governmental data I have to praise them for asking very interesting questions in the Bundestag (this map on secretive transports of radioactive waste across Germany became public thanks only to Die Linke!).

The PCA plot

The euclidian distances depicted in the PCA plot (check out the 3D model if you didn’t yet) match the perceived political differences between the parties surprisingly well. FDP is of course staying as close as possible with its bigger brother, but apart from this triviality it makes totally sense that the closest two parties are SPD and Die Grünen, the most distant are Die Linke and CDU/CSU and that SPD and Die Grünen are closer to Die Linke than to CDU/CSU. To find this pattern in the plot was the best I could hope for.

And the winner of this term’s award for open-minded thinking is …

Hans-Christian Ströbele (Bündnis 90 / Die Grünen) who shows so far the strongest inclination to make use of his neurological facilities (called “thinking”) while taking place in 249 of 254 votings – that’s a top score in both regards and shows that he is also taking his responsiblity as a politician working for the German society very serious! And as an official member of this society I say:

Thank you very much, Mr. Ströbele!

Here is the top ten of deviators

Delegate Party Avg. diff. Participations
Peter Gauweiler CSU 0.62 84
Hans-Christian Ströbele Grüne 0.27 249
Manfred Kolbe CDU 0.25 237
Waltraud Wolff SPD 0.24 199
Klaus Barthel SPD 0.23 249
Marco Bülow SPD 0.23 163
Frank Schäffler FDP 0.21 225
Monika Lazar Grüne 0.18 253
Petra Hinz SPD 0.18 254
Josef Göppel CSU 0.17 233

Peter Gauweiler is an extreme case regarding the average difference from party opinion as well as number of participations. I find it odd that he just participated in a third of all votings so far. The wikipedia article mentions that this is a known phenomena already.

Considering the reasoning for fewer troublemakers in the parties forming the majority Manfred Kolbe sticks out as a comparatively strong counter example. Because he also apparently takes his responsibility serious I grant him the runner-up slot and also say to him: Thank you, Herr Kolbe!

9 thoughts on “Visualization of voting behaviour in the 17th German Bundestag

  1. @Staffan:

    Sorry, for the late reply – I just returned from a vacation.

    Thank you very much for your suggestions – the algorithms as well as the web-site. I will give it a closer look.

    I’d be more than happy to share the data set with you. I will mail you the data set in the course of this week!

  2. Very interesting, but have you thought of using any other method? Like W-NOMINATE, or IDEAL? They both handle missing data much better than PCA (especially IDEAL). They are both available as R packages (IDEAL as a part of the MCMC-pack). I also suggest you take a look on the website as well, maybe it could give some inspiration. And, the most important question of all, could you share the dataset with us?

  3. Hey Raffael:

    This is a very interesting analysis.  Having more than 2 parties around definitely makes for a more useful plot, and the addition of the third dimension captures interesting tendencies.

    I’m curious if there is a way to visualize coalition behavior between parties that only occasionally vote the same way (so not a tight-knit group like the FDP/CDU/CSU block).  Does this happen at all in German politics?

    It would also be interesting to plot politicians who frequently change parties vs their distance from the party center over several different Bundestag sessions.  Do people who “jump ship” try to blend in with their new party, or do they mostly stick to their own views/ideals?


  4. @jcborras:
    This question is really tough to answer propperly. If you care about the rights of the simple man, the working class, you’ll traditionally vote for SPD – if you are more concerned about the environment and human rights, then you’ll support Die Grünen.

    The real difference is that SPD is like CDU a classical governmental party and Die Grünen like Die Linke are used to be either in opposition or a smaller part of the government. So they are traditionally more loud, more aggressive. FDP is an opportunistic party – they will try to get into a coalition, if it is not working out they will just be there and occasionally say something. That’s my perception.

  5. I rephrase the question: as we can see there is a clear divide between SPD and Die Grünen in the 3-dimensional projection, from your subjetive opinion as a local german politics observer you know of any particular issues in which both groups show clearly separated voting choices?

  6. @jcborras:
    technically there might be cases where you need a fourths, fiths,… dimension to separate two groups. In this case it emphasizes that they are close regarding the votings. For me subjectively they are also politically very close because I so far always voted for Grüne or SPD – the only practical difference is that SPD is traditionally stronger than Grüne.

  7. Yet another scenario of political herding behaviour. Interestingly enough it takes 3 dimensions to linearly separate Die Grünen from SPD… would you comment on the issue from your subjective opinion (data analysis aside)?

  8. how did you manage the names? you mentioned you manually adjusted it but how did you even know a name changed.

    e.g. through marriage, a name might change to something completely different, not just a ‘typo’ or different spelling

Comments are closed.