Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London.
TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading).
With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second.
TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting.
To learn more about how we use Spark and the services we can deliver through our Platform please contact: email@example.com