Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

2,018 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics
  • Be the first to comment

The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

  1. 1. The Little Warehouse That Couldn’t Or: How We Learned to Stop Worrying and Move to Spark 1 Yandu Oppacher (@yandu) Data Infrastructure
  2. 2. 2 Shopify Stores
  3. 3. ETL Warehouse Reporting August 2013 TilllerRuby Vertica 3
  4. 4. Why we had to move • Data volume • Data/Query complexity • Performance issues 4
  5. 5. Couple of false starts 5 Pig + Luigi Pig + Oozie Platfora
  6. 6. –platfora.com “Without coding or ETL, data warehousing, BI tools, or breaking a sweat.” 6
  7. 7. Enter Spark • Fast • Nice development model • Python 7
  8. 8. 88 The Good Book
  9. 9. GMV A Case Study 9 165,000+ ACTIVE SHOPIFY MERCHANTS $8 BILLION+ CUMULATIVE GMV
  10. 10. Growing pains • Joins • Groupings • General data skew • Getting to know python’s performance quirks 10
  11. 11. Starscream 11 • specialized joins • resolvers • range • cassandra • overby • contracts • incrementalized fact builds
  12. 12. Our current stack 12 Kafka OLTP HDFS Cassandra Spark FrontroomBackroom Redshift Tableau
  13. 13. Thank you 13 Yandu Oppacher (@yandu) Data Infrastructure

×