Your SlideShare is downloading. ×
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

1,449

Published on

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, …

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,449
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
  • 2. DSE Platform
  • 3. DSE PlatformChukwa S3
  • 4. Aegisthus
  • 5. DSE Platform ChukwaAegisthus S3
  • 6. Sting
  • 7. DSE Platform Sting ChukwaAegisthus S3
  • 8. What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
  • 9. DSE Platform Sting ChukwaAegisthus S3
  • 10. S3
  • 11. S3
  • 12. 99.999999999%
  • 13. S3
  • 14. High SLAS3 Query
  • 15. HDFS ?
  • 16. “Data Science as a Service”• Execution Service / Genie• Event Service• Metadata Service
  • 17. High SLA Cluster Job High SLA S3 Query Cluster Job Query
  • 18. High SLA S3Query Cluster Job Query
  • 19. High SLA Cluster Job High SLA S3 Query Cluster Job Query
  • 20. Super SLA Cluster Job Super SLA S3High SLA Cluster Job High SLA Query Cluster Job Query
  • 21. Super SLA Cluster JobHigh SLA Cluster Job High SLA S3 Query Cluster Job Query
  • 22. Questions? http://jobs.netflix.comkurtbrown@netflix.com

×