Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meson: Heterogeneous Workflows with Spark at Netflix

828 views

Published on

Spark Summit 2016

Published in: Data & Analytics
  • Be the first to comment

Meson: Heterogeneous Workflows with Spark at Netflix

  1. 1. Heterogeneous Workflows with Spark at Netflix 0 Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure
  2. 2. 1 Help members find content to watch and enjoy to maximize member satisfaction and retention
  3. 3. Everything is a Recommendation 2 Recommendations are driven by Machine Learning Ranking Rows
  4. 4. Machine Learning Pipeline 3 User Selection Feature Generation Model Validation Publish Model Model Training
  5. 5. Machine Learning Pipeline Challenges 4 • Innovation • Heterogeneous Environments • Spark • Native Support • Separate Orchestration and Execution • Multi Tenancy • Machine Learning Constructs • Parameter Sweep – 30k Dockers
  6. 6. Meson Workflow System 5 • General Purpose Workflow Orchestration and Scheduling framework • Delegates execution to resource managers like Mesos • Optimized for Machine Learning Pipelines and Visualization • Checkout the Blog • http://bit.ly/mesonws or techblog.netflix.com • Plan to Open Sourced soon
  7. 7. Meson Architecture 6
  8. 8. Standard and Custom Step Types 7
  9. 9. Parameter Passing 8 Hive Query User DataSet Regional DataSet Global DataSet Get Users Regional Model Global Model User DataSet Wrangle Data
  10. 10. Structured Constructs 9
  11. 11. Top Down or Bottom Up 10
  12. 12. Two Way Communication 11
  13. 13. Spark Step 12
  14. 14. Artifacts 13 • Step outputs tracked as Artifacts • Visualization • Memoization
  15. 15. Multi Tenancy 14 • Resource Attributes • spark.cores.max • spark.executor.memory • spark.mesos.constraints • Dynamic Resource Allocation
  16. 16. Cluster Management 15 • Red-Black software updates • Scale up/Scale down
  17. 17. Meson/Spark Cluster 16 • 100s of Concurrent Jobs • 700 Nodes • 5000 Cores • 25 TB Memory • Apps: Meson Workflow System, Spark and Dockers • Few smaller clusters
  18. 18. 17 Antony Arokiasamy Kedar Sadekar @aasamy /aasamy aarokiasamy@netflix.com @kedar_sadekar /kedar-sadekar ksadekar@netflix.com

×