Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto Summit 2018 - 04 - Netflix Containers

287 views

Published on

Running Presto in a containerized environment (Vinitha Gankidi & Ted Gooch, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Presto Summit 2018 - 04 - Netflix Containers

  1. 1. Vinitha Gankidi Ted Gooch Running Presto in a Containerized Environment
  2. 2. ● Big data ecosystem at Netflix ● Presto at Netflix ● Motivation ● Presto on Titus Architecture ● Deployment challenges ● Auto scaling ● Results Contents.
  3. 3. Netflix Big Data Ecosystem
  4. 4. Decouple compute & storage Production Adhoc 2 clusters ~3300 d2.8xls ~900 d2.8xls Multiple clusters Largest ~300 r4.4xls
  5. 5. Presto at Netflix
  6. 6. ● Data exploration ● Data validation ● Backend for our A/B test platform ● Reporting Presto use-cases
  7. 7. ● Genie - A federated job orchestration engine ○ Cluster registration ○ Tag based routing ● Cluster Red/Blacks Query Routing
  8. 8. ● Spinnaker pipeline to launch a cluster ● Triggers a Jenkins job to build Debian package ● Create an Amazon Machine Image ● Tag the image ● Deploy the image on ec2 instances Presto EC2 deployments
  9. 9. Titus
  10. 10. ● Netflix container management platform built on Apache Mesos ● Provides cloud-native integration with Amazon AWS ● Can run images packaged as Docker containers ● Widely used at Netflix - about 3 million containers per week Titus
  11. 11. Motivation
  12. 12. ● Highly volatile workloads, need more isolation ● Difficult to manage large clusters ● Faster deployments ● Play in the larger resource pool ● Scale clusters based on cluster load Why containerize?
  13. 13. Presto on Titus Architecture
  14. 14. Coordinator Worker Titus Container Titus Container Worker Titus Container Worker Titus Container
  15. 15. ● Docker image size ● What size containers to use? ● Tuning Presto configs based relative to the container size ● Ulimits ● Titus Migrations ● Chaos monkey ● Metrics/ Dashboard Deployment challenges
  16. 16. Auto-Scaling
  17. 17. ● Single large cluster ○ Pros ■ Continuous workload arriving ■ Lots of resources when usage is low ■ Low operational overhead ○ Cons ■ Resource contention ■ Single configuration Single Large Cluster vs Multiple Clusters
  18. 18. ● Multiple Clusters ○ Pros ■ Resource isolation ■ Support differing latency expectations ○ Cons ■ Data synchronization ■ Operational overhead ■ Low cluster utilization Single Large Cluster vs Multiple Clusters
  19. 19. ● Multiple clusters ○ Pros ■ Resource isolation ■ Support differing latency expectations ○ Cons ■ Data synchronization De-coupled compute and storage ■ Operational overhead Deployment tools and query routing ■ Low cluster utilization Auto-scaling Single Large Cluster vs Multiple Clusters
  20. 20. ● Increase cluster resources during high workload ● Decrease cluster during low or idle workload ● Responsivity to scale events is critical ● Simple heuristic - active queries Auto-Scaling
  21. 21. Request more workers Coordinator Titus Container Prestosizer Titus Container Titus Container Titus Container Worker Worker Worker Titus PythongRPCAPI Titus Container Worker Kill Containers Call Shutdown-hooks
  22. 22. Results
  23. 23. Future Work

×