• Like
Uploaded on

Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London. …

Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London.

TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading).

With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second.

TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting.

To learn more about how we use Spark and the services we can deliver through our Platform please contact: hello@tumra.com

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,904
On Slideshare
0
From Embeds
0
Number of Embeds
11

Actions

Shares
Downloads
78
Comments
0
Likes
20

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. WHAT’S NEXT FOR BIG DATA? APACHE SPARK
  • 2. WTH IS SPARK?
  • 3. 3 TUMRA - Big Data Week, May 2014 Spark is … “One platform to rule them all” … and blurs boundary between SQL, machine learning, streams & graphs
  • 4. 4 TUMRA - Big Data Week, May 2014 Spark is … … gaining momentum
  • 5. 5 TUMRA - Big Data Week, May 2014 Spark has … … more contributors than Hadoop
  • 6. 6 TUMRA - Big Data Week, May 2014 Spark can … Source:  Databricks  
  • 7. 7 TUMRA - Big Data Week, May 2014 Spark Stack Source:  Databricks   Hadoop  (HDFS)  
  • 8. 8 TUMRA - Big Data Week, May 2014 Why Spark? -  Code reuse across batch, streaming and interactive applications -  Easy API from Scala, Java & Python -  In-memory data sharing FAAAAAAST!!! Check out http://spark.apache.org
  • 9. 9 TUMRA - Big Data Week, May 2014 CASE STUDY: PERSONALISATION & MARKETING AUTOMATION
  • 10. 10 TUMRA - Big Data Week, May 2014 Our history with Spark -  Early adopters; poc in Dec ‘12 -  In production since March ‘13 -  Running on Amazon EC2 -  Ad-hoc analysis and reporting -  Machine learning model building -  Integrates to our real-time dashboards
  • 11. 11 TUMRA - Big Data Week, May 2014 Use Case: Personalisation
  • 12. 12 TUMRA - Big Data Week, May 2014 Use Case: Personalisation (cont’d) -  Matching visitors to products -  50% of visitors are ‘new’ and have no history to work with -  Blend of pre-computation and real- time recommendations
  • 13. 13 TUMRA - Big Data Week, May 2014 Use Case: Marketing Automation -  Collect user engagement data across websites and mobile apps -  Increase subscription rates -  Identity users at risk of churn -  Automated personalised marketing
  • 14. 14 TUMRA - Big Data Week, May 2014 Data Volumes & Velocity -  29M events per day -  Peak rates ~800 events / second -  All events streamed to Kafka -  10B archived events in Amazon S3
  • 15. 15 TUMRA - Big Data Week, May 2014 How we use Spark Amazon  S3  (HDFS  interface)   Apache  Ka>a   Data  CollecAon  API  (Akka)  &  Connectors  
  • 16. 16 TUMRA - Big Data Week, May 2014 Spark gives us … -  Unified platform for machine learning and graph analytics -  Ability to experiment at huge scale -  SQL interfaces to existing tools -  Code reuse from data scientists to production workloads
  • 17. 17 TUMRA - Big Data Week, May 2014 WANT TO KNOW MORE?
  • 18. 18 TUMRA - Big Data Week, May 2014 http://spark.apache.org
  • 19. 19 TUMRA - Big Data Week, May 2014 Spark Summit 2014
  • 20. 20 TUMRA - Big Data Week, May 2014 Spark London Meetup
  • 21. 21 TUMRA - Big Data Week, May 2014 Commercial Support & Certification
  • 22. 22 TUMRA - Big Data Week, May 2014 THANK YOU @tumra tumra.com slideshare.net/tumra