Your SlideShare is downloading. ×
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

770
views

Published on

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & …

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

Published in: Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
770
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Science at Netflix with Amazon EMR Kurt Brown, Director of Data Platform, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Data Platform
  • 3. Data Platform Suro S3
  • 4. Aegisthus
  • 5. Data Platform Suro Aegisthus S3
  • 6. Sting
  • 7. Data Platform Sting Suro Aegisthus S3
  • 8. Data Platform Sting Suro Aegisthus S3
  • 9. Data Platform Sting Suro Aegisthus S3
  • 10. S3
  • 11. S3
  • 12. 99.999999999%
  • 13. S3
  • 14. S3 High SLA Query
  • 15. HDFS ?
  • 16. Eventual Consistency
  • 17. S3mper
  • 18. “Data as a Service” • Execution Service • Event Service • Metadata Service
  • 19. High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 20. High SLA Query Cluster Job Query S3
  • 21. High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 22. Bonus Cluster Job Bonus High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 23. Bonus Cluster Job High SLA Cluster Job High SLA Query Cluster Job Query S3
  • 24. Tez
  • 25. Suro
  • 26. Aegisthus
  • 27. Questions? kurtbrown@netflix.com http://jobs.netflix.com
  • 28. Please give us your feedback on this presentation BDT306 As a thank you, we will select prize winners daily for completed surveys!