Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

  • 1,333 views
Uploaded on

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science &......

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,333
On Slideshare
1,333
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
39
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Science at Netflix with Amazon EMR Kurt Brown, Director of Data Platform, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Data Platform
  • 3. Data Platform Suro S3
  • 4. Aegisthus
  • 5. Data Platform Suro Aegisthus S3
  • 6. Sting
  • 7. Data Platform Sting Suro Aegisthus S3
  • 8. Data Platform Sting Suro Aegisthus S3
  • 9. Data Platform Sting Suro Aegisthus S3
  • 10. S3
  • 11. S3
  • 12. 99.999999999%
  • 13. S3
  • 14. S3 High SLA Query
  • 15. HDFS ?
  • 16. Eventual Consistency
  • 17. S3mper
  • 18. “Data as a Service” • Execution Service • Event Service • Metadata Service
  • 19. High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 20. High SLA Query Cluster Job Query S3
  • 21. High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 22. Bonus Cluster Job Bonus High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 23. Bonus Cluster Job High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 24. Tez
  • 25. Suro
  • 26. Aegisthus
  • 27. Questions? kurtbrown@netflix.com http://jobs.netflix.com
  • 28. Please give us your feedback on this presentation BDT306 As a thank you, we will select prize winners daily for completed surveys!