Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

  • 726 views
Uploaded on

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & …

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
726
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
40
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Science at Netflix with Amazon EMR Kurt Brown, Director of Data Platform, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Data Platform
  • 3. Data Platform Suro S3
  • 4. Aegisthus
  • 5. Data Platform Suro Aegisthus S3
  • 6. Sting
  • 7. Data Platform Sting Suro Aegisthus S3
  • 8. Data Platform Sting Suro Aegisthus S3
  • 9. Data Platform Sting Suro Aegisthus S3
  • 10. S3
  • 11. S3
  • 12. 99.999999999%
  • 13. S3
  • 14. S3 High SLA Query
  • 15. HDFS ?
  • 16. Eventual Consistency
  • 17. S3mper
  • 18. “Data as a Service” • Execution Service • Event Service • Metadata Service
  • 19. High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 20. High SLA Query Cluster Job Query S3
  • 21. High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 22. Bonus Cluster Job Bonus High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 23. Bonus Cluster Job High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 24. Tez
  • 25. Suro
  • 26. Aegisthus
  • 27. Questions? kurtbrown@netflix.com http://jobs.netflix.com
  • 28. Please give us your feedback on this presentation BDT306 As a thank you, we will select prize winners daily for completed surveys!