Relation of Word Order and Compression Ratio and Degree of Structure

smallHaving a habit of compulsively wondering approximately every 34.765th day about how zip compression (bzip2 in this case) might be used to measure information contained in data – this time the question popped up in my head of whether or not and if then how permutation of a text’s words would affect its compression ratio. The answer is – it does – and it does so in a very weird and fascinating way.

Lo and behold James Joyce’s “A Portrait of the Artist as a Young Man” and its peculiar distribution of compression ratios …

Comparison of word frequency in english literature


The scatterplot shows the frequency of occuring words for two sets of texts. You click on one circle and you see the words for it on the left hand side. The app is built on d3.js (my second small project using it) and I am planning to write an introductory article on it soon. Apart from a few issues it is fun to work with d3.

