Counting the frequency of ordered character pairs

I wrote this PHP script to count the character combinations for a previous post. This script is covering more functionality than I acutally used for the article. First I was fetching web-sites – using a service from www.alchemyapi.com which extracts the actual text information from a site. But after figuring that it was a bit tricky to choose a halfway representative set of web sites for a language, I took a short cut and just evaluated a big text file containing a classic novel for each language.

Continue reading

Frequency of character combinations for three languages

I was curious about the frequency in which ordered character pairs are observable in different languages. So I wrote a PHP script that fetches texts online or from the disk and parses them. I chose one classic novels as a source for one language. The choices are certainly not representative for the language but they provide some kind of insight still, I think. From the sources I only used the texts actually belonging to the novel.

Continue reading