Okay, enough of that – here’s the deal – essentially looking at the monthly variation of the seasonal component of live births by month reveals strong trends indicating more births in the first half of the year towards the 60s relative to towards 2010 and the opposite relation manifests in the second half. Now assuming a standard duration of labour of 9 months we can calculate the maximum-likelihood month of intercourse – and that’s where it gets juicy! BTW – less children being born in spring means less anxious generations ahead – but more on that later.

For Germany this trend is very strong – June to August becoming less popular for procreation while October to January coming in favour of potential parents.

Estonia shows the same trend pattern, while relative magnitudes between months are different.

Seasonal trend is comparatively week – likely due to the mild climate.

Trend of seasonal pattern reverses at around 1980. According to Google Analytics I had 40 visits from Icelanders in 2014 – my question to you: “What happened?”.

Source of the data is Eurostat and specifically the dataset “demo_fmonth“. The decomposition into season, trend and remainder is done using stats::stl() with a seasonal window of 36 months and a trend window of 48 months – I figured this is a good compromise for capturing the seasonal pattern while not diluting it with a too long time span. The code is of course published on GitHub and you can give the code a try right now thanks to devtools::source_url().

# https://github.com/joyofdata/joyofdata-articles/ # blob/master/people-had-more-sex-in-summer/SESSION_REMOTE.R library(rsdmx) library(forecast) library(devtools) base_url <- cat( "https://raw.githubusercontent.com/joyofdata/", "joyofdata-articles/master/people-had-more-sex-in-summer" ) source_url(sprintf("%s/extract_ts_for_country.R", base_url)) source_url(sprintf("%s/load_sdmx_from_eurostat.R", base_url)) source_url(sprintf("%s/monthplot_for_country.R", base_url)) # load data from Eurostat d <- load_sdmx_from_eurostat(dataset = "demo_fmonth") # available time series data for countries #> table(d$data$geo)[table(d$data$geo) > 600] # plot away - aye, aye, Sir monthplot_for_country(d$data, "DE", sw=36, tw=48, pstl=TRUE, yred=1990)

And a special biiiig “Thank you very much!” goes to Emmanuel Blondel and Matthieu Stigler for creating rsdmx – which finally makes it a cinch to load SDMX! So far I either had to resort to ill-structured TSVs on Eurostat or use their “SDMX Converter“.

Now why do for example Germans “make less babies” in summer and instead choose to reproduce in late autumn / winter? I attempt to explain:

- In earlier days household sizes were larger and hence romantic feelings more likely overcame (future) Mommy and (future) Daddy on a warm sunny day walking through the cornfields!
- Generation Facebook tends to indulge in hedonistic and narcissistic lifestyle during summer – too busy making selfies … but then autumn strikes – it gets cold and dark and flowers wilk – we start think about getting older and finally dying. A natural reaction is to bring a little copy of yourself into the world!
- Back in the sixties houses were not so well thermally insulated and equipped with heating systems. Nowadays flats are so warm and so cozy – causing warm and cozy feelings – add candle light and you’re halfway there.
- Of course procreation got less causally related to intercourse since contraceptives became more available. Which implies that concious decision making gained a more important role.

Any ideas?

Now hang on to your hat, friend, cause this is not just some harmless time shift of good vibrations … this has long-term and serious impact on society! Scientists published a study indicating that when you are born correlates with five affective temperaments of your future personality [1] [2]. Children born in spring seem to be more anxious than those born in summer/autumn but then again less irritable. More irritable and less anxious is just the right mix of qualities for a career hooligan – that’s bad news!

[1] Season of birth affects your mood later in life (telegraph.co.uk)

[2] Association between affective temperaments and season of birth in a general

student population (Journal of Affective Disorders)

You are still hungry for more data on life and death – don’t worry, there you go:

(original article published on www.joyofdata.de)

]]>Lo and behold James Joyce’s “A Portrait of the Artist as a Young Man” and its peculiar distribution of compression ratios …

library(stringr) library(ggplot2) compress <- function(str) { length(memCompress(str, type="bzip2")) / nchar(str) } calc_ratios <- function(title, N, sep=" ") { f <- paste("/media/Volume/temp/struc/",title,".txt",sep="") t <- scan(f, what=character()) y <- unlist(sapply(t, function(l) str_split(l,"[^a-zA-Z]"))) names(y) <- NULL y <- y[nchar(y)>1] ratios <- rep(NA, N) for(i in 1:N) { ratios[i] <- compress(paste(y[sample(length(y))],collapse=sep)) } results <- list("perms" = ratios, "original" = compress(paste(y,collapse=sep))) return(results) } res <- calc_ratios("portraitartistyoungman", N=10000) ggplot(data = data.frame(x = res[["perms"]])) + geom_histogram(aes(x = x),binwidth=.00001,stat="bin") + geom_vline(xintercept=res[["original"]], color="red") + labs(title="compression ratio of not permuted text in red", x="", y="") + theme(plot.title = element_text(vjust=1, size=12)) hist(res[["perms"]], breaks=100, main = "A Portrait of the Artist as a Young Man", xlab="distribution of compression ratios for word permutations with median in red", ylab="") abline(v=median(res[["perms"]]), col = "red")

So obviously the compression ratio is indeed affected by word order and even comes along with a very nice looking distribution. But the best part is that the compression ratio of the original word order is effectively statistically mega-super-über-hyper-significant! The same is true for all the book’s texts I checked out and, I guess, it is true for every natural language text in the world. Intuitively I thought that maybe there is a distribution which is typical for text but from what I see, they are pretty different from content to content. It even seems to me that there is a resemblance of shape per author (“authorship attribution”). Though, Ulysses – the literary troublemaker of last century – is of course not playing along nicely. Nonetheless – Twain features a plateau, the other two Joyces a double hill and Dickens a bell-shape (see six-pack below).

The difference between a normal text and a permutation of it is of course that we kill its structure by doing so. This makes me wonder if it would be possible to measure a texts degree of structure by relating its original compression ratio to its distribution of compression ratios. F.x.:

A *low* DOS would indicate a *high* degree of structure because of the relatively high difference to the lower boundary of the distribution of compression ratios for randomly permuted word order. Looking at the DOS column of the table below this interpretation seems reasonable when we compare the degree f.x. for Dickens vs. Joyce. Assuming that Dickens narrates rather straightforward as opposed to Joyce who is notorious for his demanding literary style.

wc = word count original = compression ratio of original text (word order) dec_nth / median = nth decile / median of c.r. dist. for permuted word orderings title wc original dec_1st median dec_9th DOS Dubliners 65609 0.28288 0.31397 0.31446 0.31511 0.901 Ulysses 253927 0.32091 0.34607 0.34632 0.34663 0.927 Great Expectations 176096 0.26178 0.29518 0.29534 0.29556 0.887 David Copperfield 337291 0.25696 0.29184 0.292 0.29217 0.88 Huckleberry Finn 105441 0.26252 0.2999 0.30017 0.30056 0.875 Life on Mississippi 138867 0.27769 0.30453 0.30474 0.30534 0.912

All texts are taken from Project Gutenberg – relieved of sections not belonging to the book itself – reduced to letters a to z and A to Z by replacing anything else with a space – the result is split at white spaces and all “words” of length 1 are discarded. This vector of words or string atoms is then permuted using sample().

The compression applied is of type bzip2 and performed using memCompress() with calculation of compression ratio being implemented as follows:

compress <- function(str) { length(memCompress(str, type="bzip2")) / nchar(str) }

(original article published on www.joyofdata.de)

]]>

The tf-formula is a ratio of a term’s occurences in a document and the number of occurences of the most frequent word within the same document. Why this makes sense is pretty self explanatory. But obviously we would end up with stop words yielding high scores – and even if those would have been discarded before, a lot of words naturally show up often in a long text but aren’t relevant to the specific document.

And this is exactly where the idf-factor comes into play as it represents the inverse of the share of the documents in which the regarded term can be found. The lower the number of containing documents relative to the size of the corpus, the higher the factor. The reason why this ratio is not used directly but instead its logarithm is because otherwise the effective scoring penalty of showing up in two documents would be too extreme. As you can see in the plot – the idf for a term found in just one document is twice the idf for a term found in two. This would heavily bias the ranking in favor of super-rare words even if the tf-factor indicates a high relevance. It is very unlikely that a word is of high relevance in one document but never used anywhere else.

# R code for above chart library(ggplot2) df <- data.frame( x = c(1:20,1:20), y = c(1/1:20,log(20/1:20)/log(20)), idf = c(rep("1/n",20),rep("log",20)) ggplot(data = df), aes(x,y)) + geom_line(aes(color = idf)) + geom_point(aes(color = idf)) + scale_x_discrete() + labs(title = "comparison of relative impact of idf-formula (scaled to 1) if term occurs in more or less documents") + xlab("number of documents a term is contained in") + ylab("") + theme(axis.title.x = element_text(color="grey20"))

Both factors react positively to higher relevance – one using local information, the other taking a more global perspective. So we can simply take the product and we have the “traditional” tf-idf-formula. But this formula is not set in stone and depending on whether you want to put emphasis on the tf- or rather on the idf-part it might make sense to get a feeling for the ranking behaviour in your use case and apply adjustments to it until you are d’accord with the result.

For example in the case of the Bundestag protocols I came to the conclusion that the final rankings are more useful if I put some more weight on the idf-part – which leads to effectively penalizing a word’s ranking if the word shows up in more than one document. This seems to make sense right now as my corpus only keeps 18 documents. So what I did is I added the idf-part and divided it with the size of the corpus – see tfidf’ in the top image. That has the effect that the penalization of non-exclusivity is decreasing with time as the corpus grows in size (more protocols being published).

# calculates tf-idf for different parameters and using # different tf-idf-versions tab_tfidf <- function(ncorpus=20) { # I assume a maximum word frequency of 4000 max_ft <- 4000 # tf-idf without log tfidf0 <- function(ft,max_ft,ndocs,ncorpus) (ft/max_ft) * (ncorpus/ndocs) # traditional tf-idf tfidf1 <- function(ft,max_ft,ndocs,ncorpus) (ft/max_ft) * log(ncorpus/ndocs) # tf-idf with added idf/N tfidf2 <- function(ft,max_ft,ndocs,ncorpus) (1/ncorpus + ft/max_ft) * log(ncorpus/ndocs) # ft = frequency of term / ndocs = how often it showed up in other documents df <- expand.grid(ft=c(5,10,20,30),ndocs=c(1,2,3,5,10)) res0 <- apply(df,1,function(r) tfidf0(r["ft"],max_ft,r["ndocs"],ncorpus)) ranks0 <- order(order(-res0)) res1 <- apply(df,1,function(r) tfidf1(r["ft"],max_ft,r["ndocs"],ncorpus)) ranks1 <- order(order(-res1)) res2 <- apply(df,1,function(r) tfidf2(r["ft"],max_ft,r["ndocs"],ncorpus)) ranks2 <- order(order(-res2)) result <- cbind(df,res0,res1,res2,ranks0,ranks1,ranks2) result <- result[order(result$ft),] return(list("ncorpus" = ncorpus, "max_ft" = max_ft, result)) } # tf-idf for combinations of term frequency in {10,20,30} and # occurences in {1,2,3} relative to (20, 2) get_change_matrix <- function(res, colname) { m <- matrix(res[res$ft %in% c(10,20,30) & res$ndocs %in% 1:3,colname], ncol=3) # num of documents where word is assumed to be present rownames(m) <- as.character(1:3) # num of occurences within the hypothetical document colnames(m) <- as.character(1:3*10) # (A-B)/B m <- round((m - m[2,2])/m[2,2],2) return(m) }

On that note – let’s see how the two versions of tf-idf compare – assuming a corpus containing 20 documents and the most frequent word to show up 4000 times.

> res <- tab_tfidf() # tfidf (traditional) > get_change_matrix(res[[3]],"res1") 10 20 30 1 -0.35 0.30 0.95 2 -0.50 0.00 0.50 3 -0.59 -0.18 0.24 # tfidf' (my flavor) > get_change_matrix(res[[3]],"res2") 10 20 30 1 0.24 0.30 0.36 2 -0.05 0.00 0.05 3 -0.21 -0.18 -0.14

Let’s for example compare a word A occuring 20 times in the document d and in 2 documents of the corpus to another word B occuring 10 times in the document d and in only 1 document (d, of course) of the corpus. In case of the traditional formula the tf-idf-statistic of B would be 35% lower compared to the tf-idf-statistic of A. In case of the alternated tf-idf-formula on the other hand we would experience an increase of B’s tf-idf-statistic by 24% compared to A’s! So it makes sense to think a bit about how exactly one would like the score to behave / rank words.

I didn’t want to spend too much time on this sub-project, so I kept it at my rule of thumb optimization. But when you start thinking about the tf-idf idea a lot of possible tweaks come to mind. To give an examle, true to the motto “once won’t do any harm” we could restrict the denominator of idf to taking only documents into account where the term shows up at least twice.

Obviously the tf-idf approach to quantification of a keyword’s relevance has its limits. For example it is easily conceivable that a word will show up in a lot of documents of the corpus and yet play a central role in each of it. Or a subject is covered in several documents **because it is** very important – but tf-idf would penalize terms typical for this subject exactly because of that reason. This is why tf-idf is most certainly not the answer to everything. I came across another idea described in a paper from 2009 where the density of a term is used to infer its relevance. The basic idea is that a very relevant word will show relatively strong local densities compared to a common word with a more uniform density. Below you see the a density approximation for three stop words (“and”,”the” (male) and “the” (female)) and the densities for three terms that scored highest with respect to tf-idf in protocol #11.

# vector of words from protocol #11 - length: 127375 words # wv <- c("Inhaltsverzeichnis", "Plenarprotokoll", "Deutscher", ...) plot(density(which(wv=="und"), bw="SJ", n=1000, adjust=.5, from=0, to=length(wv)), main="black: 'und', blue: 'die', green: 'der'", xlab="index of word") lines(density(which(wv=="die"), bw="SJ", n=1000, adjust=.5, from=0, to=length(wv)),col="blue") lines(density(which(wv=="der"), bw="SJ", n=1000, adjust=.5, from=0, to=length(wv)),col="green") plot(density(which(wv=="Tierhaltung"), bw="SJ", n=1000, adjust=.5, from=0, to=length(wv)), main="black: 'Tierhaltung', blue: 'Verbraucherpolitik', green: 'Rechtspolitik'", xlab="index of word") lines(density(which(wv=="Verbraucherpolitik"), bw="SJ", n=1000, adjust=.5, from=0, to=length(wv)),col="blue") lines(density(which(wv=="Rechtspolitik"), bw="SJ", n=1000, adjust=.5, from=0, to=length(wv)),col="green")

If it is possible to take the distribution into account then we can even determine relevancy for insulated single documents.

(original article published on www.joyofdata.de)

]]>

Let’s assume you would like to buy a product and you can choose from two sellers A and B. The only relevant difference is the ratings – which a customer can give to a seller after a purchase choosing between ‘good’ and ‘bad’. Seller A received 10 ratings of which 9 are good and seller B received so far 500 ratings of which 400 (80%) happen to be ‘good’.

The naive approach would be to just assume that the ratio of good ratings is by itself the best indicator of a seller’s quality. Though I guess most people would automatically take the number of invoveld ratings into account – but lets ignore the psychological aspect and focus on a statistical approach.

When we make a choice for a seller in above situation based on a average rating we implicitely assume that the rating represents an inherent feature of the seller. But when we think about it a bit, it becomes clear that the observed average rating is based on chance and that even a seller with a true but invisible rating level of 50% will yield thanks to luck 80% for a while. In the long run the “law of large numbers” will bring it back to where it belongs but even with 50/50 chance of receiving a good rating (due to the true but hidden quality level) it is possible to end up with 9 of 10 positive ratings.

But of course we don’t know the “true” rating of a seller but when we think of the rating process as a Bernoulli experiment (1 = good, 0 = bad), we can simulate a large number of independent Bernoulli experiments for a finite sequence of probabilities and see how likely it is to end up with the observed number of positive ratings for a given total number of ratings.

# simulates 10^5 Bernoulli experiments of length 500 for success probabilities # from 0 to 1 in steps of size 0.01 # m will have 101 columns for every probability and 10^5 rows keeping # the resulting sucess totals m <- sapply(0:100/100, function(prob) rbinom(10^5, 400+100, prob)) # turns m into boolean matrix with true for element is 400 and else false. # then we form sum all booleans (true as 1 and false as 0) and we know how # often for a given probability (assumed rating) we observed 400 good ones. v <- colSums(m == 400) # v stores the number of succesful replications for every probability. But # to feed it to hist() or geom_histogram() we need a vector keeping a probability # as often as it leaded to a success. df_sim <- data.frame(p = rep(0:100/100, v))

Let’s make a complete R program of this idea and with ggplot2 we can see the resulting histograms for seller A (the flat shaped histogram) and seller B (the spiked shaped histogram). And now we finally get to the beta distribution whose density curve is plotted on top in red and orange colors.

So the central observation is that the beta distribution f.x. with parameters α=400+1 and β=100+1 simply describes the probability that a certain true rating of seller B led to 400 positive ratings and 100 negative ratings.

Now that we know that we can calculate a 95% confidence intervals for seller A and B for their true unknown rating level.

# 95%-CI for seller A (9 of 10 are good) > qbeta(c(0.025,0.975),9+1,1+1) [1] 0.5872201 0.9771688 # 95%-CI for seller B (400 of 500 are good) > qbeta(c(0.025,0.975),400+1,100+1) [1] 0.7626715 0.8326835

This tells us that the interval [58%, 98%] captures the true quality of seller A in terms of ratings with a chance of 95% and the interval [76%, 84%] captures the true quality of seller B (in terms of ratings) with a chance of 95%.

So one reasoning you could justify with those confidence intervals is that if a bad experience like a broken product due to bad packaging or late shipping is especially non-desirable you would choose seller B because his lower bound is about 20 p.p. higher than that of seller A. In another case – different scenario you might be more interested in tapping full potential, then you might go for seller A because with a 95% chance you can get up to 98% of true quality level.

For example the helpfulness ratings of product ratings (“34 people of 53 found this product rating to be helpful”) might be weighted using the lower bound of the 95% confidence interval or its span.

library(ggplot2) # 90% positive of 10 ratings o1 <- 9 o0 <- 1 M <- 100 N <- 100000 m <- sapply(0:M/M,function(prob)rbinom(N,o1+o0,prob)) v <- colSums(m==o1) df_sim1 <- data.frame(p=rep(0:M/M,v)) df_beta1 <- data.frame(p=0:M/M, y=dbeta(0:M/M,o1+1,o0+1)) # 80% positive of 500 ratings o1 <- 400 o0 <- 100 M <- 100 N <- 100000 m <- sapply(0:M/M,function(prob)rbinom(N,o1+o0,prob)) v <- colSums(m==o1) df_sim2 <- data.frame(p=rep(0:M/M,v)) df_beta2 <- data.frame(p=0:M/M, y=dbeta(0:M/M,o1+1,o0+1)) ggplot(data=df_sim1,aes(p)) + scale_x_continuous(breaks=0:10/10) + geom_histogram(aes(y=..density..,fill=..density..), binwidth=0.01, origin=-.005, colour=I("gray")) + geom_line(data=df_beta1 ,aes(p,y),colour=I("red"),size=2,alpha=.5) + geom_histogram(data=df_sim2, aes(y=..density..,fill=..density..), binwidth=0.01, origin=-.005, colour=I("gray")) + geom_line(data=df_beta2,aes(p,y),colour=I("orange"),size=2,alpha=.5)

]]>

(**Attention**: The calculations and analysis are not biased by my political views – but the interpretation of the results might be and their verbal formulation certainly is … ;)

About a week ago I came across an article titled “How divided is the Senate?” by Vik Paruchuri where he uses a method called principal component analysis (PCA) to visualize the closeness of votings given by senators of the 113th Congress of the USA. I immediately fell in love with the idea behind this article as well as the method applied – which was a great opportunity to revise some statistics and alebra basics. And because (pretending) transparency is a major foundation of a modern democracy, full detailed word by word protocols of every meeting of the Bundestag are published as PDFs and text files on their website. So I downloaded all those protocols for the 17th Bundestag, extracted the votings and loaded the votes into a data frame. That was quite a drag because judging from typos (Sevim Dadelen, Sevim Dagelen, Sevim Dagdelen, …), different name versions (Erwin Josef Rüddel, Erwin Rüddel) and line breaks within the longer names like Dr. Karl-Theodor Freiherr von und zu Guttenberg (his title is gone, so the name became a tad handier by now) those text files where manually sanitized PDF convertions of live transcripts. I’ll spare you the details – but getting the data finally right took quite some effort.

But in the end I won this battle and my data set kept more than 145’000 records of all the votes given by 651 delegates in 254 votings. A “Yes” I registered as 1, a “No” as -1 and “I don’t care – where is my money?” as 0. For the PCA plot I assumed 0 if a delegate didn’t take part in the voting. For this visualization I only considered delegates who took part in at least 230 votings – otherwise the outliers would be mostly individuals who rarely joined votings – some of them because they died, resigned or stepped in. This last group of delegates is also not considered in the boxplot on participations for obvious reasons. The boxplot on average deviation from the party’s opinion only considers delegates with at least 100 votes – to have something to statistically chew on. In that case by the way absence of votings are just not used in the calculation – so no assumption of 0 then like for the PCA.

When you want to visualize the distribution of records keeping two or three attributes – say age, height and weight measured for a set of individuals – then you can just represent one vector of observations in two or three dimensions. No problem. But in this use case we are facing not two or three but 254 dimensions! Now the great thing is all those points are usually not evenly or hyper-spherically distributed but forming a shape that can be represented “good enough” with less dimensions. PCA is a mathematical trick to identify a sub-space of less dimensions that neatly wraps this hyper-cloud of points. If I made you curious I highly recommend this paper by Lindsay I. Smith. Using this source I programmed the calculations in R for curiosity myself. I am still fascinated that one can do something like this at all.

(The party’s opinion on a voting is simply the rounded average of all vote values given by members of that party. And the “average difference of a delegate” is the average of the sum of all absolute differences between an individuals vote and the party’s opinion.)

CDU/CSU and FDP are indistinguishable in the PCA plot. That’s because they are forming a coaltion. They are also showing the strongest flock of sheeps – all voting the same – as you can see on the box plot on the right hand side (most delegates are very close to a 0 score). This can be explained with the fact that this coalition forms the majority and hence is likely to win any voting – given everybody acts in concert. So comparatively rarely a delegate is willing to risk f****** up a voting – by expressing what simple people refer to as their “opinion”.

The same reasoning applies to Grüne and SPD – just inversed. Because they are forming a minority a lost voting won’t usually be blamed on an opposing party member – so they can express is with less risk.

Given the comparatively high median for deviating delegates from Die Linke with a small span between 1st and 3rd quartile might indicate that this is the party of troublemakers who tend to oppose on high and quite homogenous level on a regular basis.

The above formulated reasoning for voting discipline connected with forming a minority or a majority can also be observed in the boxplot on participations by delegate and party. Because as you can see CDU/CSU and FDP show up the most regularly compared to the other parties. Die Linke on the other hand which owns the least number of seats and is also in the PCA plot showing up distant from the other parties takes it the easiest. They have little chances of deciding the outcome of a voting. Nonetheless as a data analyst interested especially in detailed governmental data I have to praise them for asking very interesting questions in the Bundestag (this map on secretive transports of radioactive waste across Germany became public thanks only to Die Linke!).

The euclidian distances depicted in the PCA plot (check out the 3D model if you didn’t yet) match the perceived political differences between the parties surprisingly well. FDP is of course staying as close as possible with its bigger brother, but apart from this triviality it makes totally sense that the closest two parties are SPD and Die Grünen, the most distant are Die Linke and CDU/CSU and that SPD and Die Grünen are closer to Die Linke than to CDU/CSU. To find this pattern in the plot was the best I could hope for.

… Hans-Christian Ströbele (Bündnis 90 / Die Grünen) who shows so far the strongest inclination to make use of his neurological facilities (called “thinking”) while taking place in 249 of 254 votings – that’s a top score in both regards and shows that he is also taking his responsiblity as a politician working for the German society very serious! And as an official member of this society I say:

**Thank you very much, Mr. Ströbele!**

Delegate | Party | Avg. diff. | Participations |

Peter Gauweiler | CSU | 0.62 | 84 |

Hans-Christian Ströbele | Grüne | 0.27 | 249 |

Manfred Kolbe | CDU | 0.25 | 237 |

Waltraud Wolff | SPD | 0.24 | 199 |

Klaus Barthel | SPD | 0.23 | 249 |

Marco Bülow | SPD | 0.23 | 163 |

Frank Schäffler | FDP | 0.21 | 225 |

Monika Lazar | Grüne | 0.18 | 253 |

Petra Hinz | SPD | 0.18 | 254 |

Josef Göppel | CSU | 0.17 | 233 |

Peter Gauweiler is an extreme case regarding the average difference from party opinion as well as number of participations. I find it odd that he just participated in a third of all votings so far. The wikipedia article mentions that this is a known phenomena already.

Considering the reasoning for fewer troublemakers in the parties forming the majority Manfred Kolbe sticks out as a comparatively strong counter example. Because he also apparently takes his responsibility serious I grant him the runner-up slot and also say to him: Thank you, Herr Kolbe!

]]>