Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Presentation at Spark Summit 2015

  • Be the first to comment

The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

  1. 1. The Little Warehouse That Couldn’t Or: How We Learned to Stop Worrying and Move to Spark 1 Yandu Oppacher (@yandu) Data Infrastructure
  2. 2. 2 Shopify Stores
  3. 3. ETL Warehouse Reporting August 2013 TilllerRuby Vertica 3
  4. 4. Why we had to move • Data volume • Data/Query complexity • Performance issues 4
  5. 5. Couple of false starts 5 Pig + Luigi Pig + Oozie Platfora
  6. 6. –platfora.com “Without coding or ETL, data warehousing, BI tools, or breaking a sweat.” 6
  7. 7. Enter Spark • Fast • Nice development model • Python 7
  8. 8. 88 The Good Book
  9. 9. GMV A Case Study 9 165,000+ ACTIVE SHOPIFY MERCHANTS $8 BILLION+ CUMULATIVE GMV
  10. 10. Growing pains • Joins • Groupings • General data skew • Getting to know python’s performance quirks 10
  11. 11. Starscream 11 • specialized joins • resolvers • range • cassandra • overby • contracts • incrementalized fact builds
  12. 12. Our current stack 12 Kafka OLTP HDFS Cassandra Spark FrontroomBackroom Redshift Tableau
  13. 13. Thank you 13 Yandu Oppacher (@yandu) Data Infrastructure

    Be the first to comment

    Login to see the comments

  • IvanoIndrio

    Jun. 26, 2015
  • NaveenSiddareddy

    Jun. 30, 2015
  • sandeephejmadi

    Aug. 3, 2015

Presentation at Spark Summit 2015

Views

Total views

2,387

On Slideshare

0

From embeds

0

Number of embeds

55

Actions

Downloads

118

Shares

0

Comments

0

Likes

3

×