Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

1,810 views

Published on

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,810
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
85
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

  1. 1. Data Science at Netflix with Amazon EMR Kurt Brown, Director of Data Platform, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Data Platform
  3. 3. Data Platform Suro S3
  4. 4. Aegisthus
  5. 5. Data Platform Suro Aegisthus S3
  6. 6. Sting
  7. 7. Data Platform Sting Suro Aegisthus S3
  8. 8. Data Platform Sting Suro Aegisthus S3
  9. 9. Data Platform Sting Suro Aegisthus S3
  10. 10. S3
  11. 11. S3
  12. 12. 99.999999999%
  13. 13. S3
  14. 14. S3 High SLA Query
  15. 15. HDFS ?
  16. 16. Eventual Consistency
  17. 17. S3mper
  18. 18. “Data as a Service” • Execution Service • Event Service • Metadata Service
  19. 19. High SLA Cluster Job High SLA Query Cluster Job Query S3
  20. 20. High SLA Query Cluster Job Query S3
  21. 21. High SLA Cluster Job High SLA Query Cluster Job Query S3
  22. 22. Bonus Cluster Job Bonus High SLA Cluster Job High SLA Query Cluster Job Query S3
  23. 23. Bonus Cluster Job High SLA Cluster Job High SLA Query Cluster Job Query S3
  24. 24. Tez
  25. 25. Suro
  26. 26. Aegisthus
  27. 27. Questions? kurtbrown@netflix.com http://jobs.netflix.com
  28. 28. Please give us your feedback on this presentation BDT306 As a thank you, we will select prize winners daily for completed surveys!

×