MapReduce with R on Hadoop and Amazon EMR

hadoop-logoYou all know why MapReduce is fancy – so let’s just jump right in. I like researching data and I like to see results fast – does that mean I enjoy the process of setting up a Hadoop cluster? No, I doubt there is any correlation – neither causal nor merely statistical. The good news is there are already quite a lot of cloud computing providers offering Hadoop clusters on demand! For this article I got my hands on Amazon’s Elastic MapReduce (EMR) service (which is an extension of its EC2 service) that sets up the Hadoop cluster for you. Okay – almost at least. For this article we are going to count 2-grams in (dummy text) data using the stringdist library.

Continue reading