BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

2,326 views

Published on

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,326
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

  1. 1. What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
  2. 2. DSE Platform
  3. 3. DSE PlatformChukwa S3
  4. 4. Aegisthus
  5. 5. DSE Platform ChukwaAegisthus S3
  6. 6. Sting
  7. 7. DSE Platform Sting ChukwaAegisthus S3
  8. 8. What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
  9. 9. DSE Platform Sting ChukwaAegisthus S3
  10. 10. S3
  11. 11. S3
  12. 12. 99.999999999%
  13. 13. S3
  14. 14. High SLAS3 Query
  15. 15. HDFS ?
  16. 16. “Data Science as a Service”• Execution Service / Genie• Event Service• Metadata Service
  17. 17. High SLA Cluster Job High SLA S3 Query Cluster Job Query
  18. 18. High SLA S3Query Cluster Job Query
  19. 19. High SLA Cluster Job High SLA S3 Query Cluster Job Query
  20. 20. Super SLA Cluster Job Super SLA S3High SLA Cluster Job High SLA Query Cluster Job Query
  21. 21. Super SLA Cluster JobHigh SLA Cluster Job High SLA S3 Query Cluster Job Query
  22. 22. Questions? http://jobs.netflix.comkurtbrown@netflix.com

×