At StampedeCon 2014, Sukhendu Chakraborty (RichRelevance) presented "Big Data Analytics made easy using Apache Hive to R Connector."
As the leading omni-channel personalization provider, RichRelevance fully harnesses the power of Hadoop to handle petabytes of data coming from both online (clickstream) and offline (e.g. in-store) sources. Given this wealth of customer data at Richrelevance, omnichannel data integration and analytics is critical. One of the major challenges is to consolidate online, mobile, social, and other data sources to create a create a single view of users for making more insightful decisions.
Our use cases require clickstream analytics that leverage Apache Hive & R. Apache Hive is a good tool for performing ELT and basic analytics, but is limited in statistical analysis and data exploration capabilities. R, on the other hand, has become a preferred language for analytics, as it offers a wide variety of statistical and graphical packages. The downside is that R is single threaded memory intensive, making it impossible to work with data at scale.
Through a series of use cases, we will present how our version of the R to Hive connector allows us to bridge the gap between R and Hive and make big data analysis using R on terabytes of data feasible. This framework takes us a step closer to the notion of a “one solution fits all” principle where we are no longer restricted by a single compute mechanism. It is our attempt to bring the two worlds closer, such that the data source is agnostic to the tools which are used to access it.