Quick overview of programming Apache Hadoop with R. Jonathan Seidman's sample code allows a quick comparison of several packages followed by a real example using RHadoop's rmr package. Our example demonstrates using compound (vs. single-field) keys and values and shows the data coming into and out of our mapper and reducer functions.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012. Sample code and configuration files are available on github.
Clipping is a handy way to collect important slides you want to go back to later.