What is Netflix’s data warehouse?a)   Cassandrab)   Teradatac)   Hived)   S3
DSE Platform
DSE PlatformChukwa            S3
Aegisthus
DSE Platform  ChukwaAegisthus               S3
Sting
DSE Platform               Sting  ChukwaAegisthus                S3
What is Netflix’s data warehouse?a)   Cassandrab)   Teradatac)   Hived)   S3
DSE Platform               Sting  ChukwaAegisthus                S3
S3
S3
99.999999999%
S3
High SLAS3      Query
HDFS   ?
“Data Science as a Service”• Execution Service / Genie• Event Service• Metadata Service
High SLA Cluster Job                       High SLA                                  S3 Query Cluster Job                 ...
High SLA                               S3Query Cluster Job                     Query
High SLA Cluster Job                       High SLA                                  S3 Query Cluster Job                 ...
Super SLA Cluster Job                        Super SLA                                    S3High SLA Cluster Job          ...
Super SLA Cluster JobHigh SLA Cluster Job                        High SLA                                   S3  Query Clus...
Questions? http://jobs.netflix.comkurtbrown@netflix.com
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
Upcoming SlideShare
Loading in...5
×

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

1,575

Published on

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,575
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

  1. 1. What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
  2. 2. DSE Platform
  3. 3. DSE PlatformChukwa S3
  4. 4. Aegisthus
  5. 5. DSE Platform ChukwaAegisthus S3
  6. 6. Sting
  7. 7. DSE Platform Sting ChukwaAegisthus S3
  8. 8. What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
  9. 9. DSE Platform Sting ChukwaAegisthus S3
  10. 10. S3
  11. 11. S3
  12. 12. 99.999999999%
  13. 13. S3
  14. 14. High SLAS3 Query
  15. 15. HDFS ?
  16. 16. “Data Science as a Service”• Execution Service / Genie• Event Service• Metadata Service
  17. 17. High SLA Cluster Job High SLA S3 Query Cluster Job Query
  18. 18. High SLA S3Query Cluster Job Query
  19. 19. High SLA Cluster Job High SLA S3 Query Cluster Job Query
  20. 20. Super SLA Cluster Job Super SLA S3High SLA Cluster Job High SLA Query Cluster Job Query
  21. 21. Super SLA Cluster JobHigh SLA Cluster Job High SLA S3 Query Cluster Job Query
  22. 22. Questions? http://jobs.netflix.comkurtbrown@netflix.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×