Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meson: Building a Machine Learning Orchestration Framework on Mesos

1,078 views

Published on

MesosCon 2016

Published in: Data & Analytics
  • Be the first to comment

Meson: Building a Machine Learning Orchestration Framework on Mesos

  1. 1. Building a Machine Learning Orchestration Framework on Mesos 0 Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure
  2. 2. 1 Help members find content to watch and enjoy to maximize member satisfaction and retention
  3. 3. Everything is a Recommendation 2 Recommendations are driven by Machine Learning Ranking Rows
  4. 4. Machine Learning Pipeline 3 User Selection Feature Generation Model Validation Publish Model Model Training
  5. 5. Machine Learning Pipeline Challenges 4 • Innovation • Heterogeneous Environments • Spark • Native Support • Separate Orchestration and Execution • Multi Tenancy • ML Constructs • Parameter Sweep – 30k Dockers
  6. 6. Meson Workflow System in 30 seconds 5 • General Purpose Workflow Orchestration and Scheduling framework • Delegates execution to resource managers like Mesos • Optimized for Machine Learning Pipelines and Visualization • Checkout the Blog • bit.ly/mesonws or techblog.netflix.com
  7. 7. Meson Architecture 6
  8. 8. Mesos Usage 7 • Executors • Custom Executor • Executor Caching • Executor Cleanup • Framework Messages • Resource Attributes • Multi Tenancy • Cluster Management
  9. 9. Custom Executors 8 • Reuse Executor Process • e.g. Spark • Executor Id = <unique id> • Two Way Communication
  10. 10. Executor Caching 9
  11. 11. Executor Caching 10 • Executor Id = hash(<something unique for the class of executors>) • E.g. Executor Id = hash(classpath) • Match with Executor Id in Offer offers accept
  12. 12. Executor Cleanup 11 • Expiration • Explicitly keep track of Executors
  13. 13. Framework Messages 12
  14. 14. Multi Tenancy 13 • Resource Attributes • spark.mesos.constraints
  15. 15. Cluster Management 14 • Red-Black software updates • Scale up/Scale down
  16. 16. Mesos Cluster 15 • 100s of Concurrent Jobs • 700 Nodes • 5000 Cores • 25 TB Memory • Apps: Meson Workflow System, Spark and Dockers • Few smaller clusters
  17. 17. What's Next 16 • Fenzo Scheduler - https://github.com/Netflix/Fenzo • Bin Packing, Auto Scaling, Host Attributes/Constraints, Groups, etc • Cook Scheduler - https://github.com/twosigma/Cook • Multi tenant Spark Scheduler • Open Source Meson Workflow System
  18. 18. 17 Antony Arokiasamy Kedar Sadekar @aasamy /aasamy aarokiasamy@netflix.com @kedar_sadekar /kedar-sadekar ksadekar@netflix.com

×