• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
 

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

on

  • 1,833 views

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, ...

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.

Statistics

Views

Total Views
1,833
Views on SlideShare
1,833
Embed Views
0

Actions

Likes
5
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012 BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012 Presentation Transcript

    • What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
    • DSE Platform
    • DSE PlatformChukwa S3
    • Aegisthus
    • DSE Platform ChukwaAegisthus S3
    • Sting
    • DSE Platform Sting ChukwaAegisthus S3
    • What is Netflix’s data warehouse?a) Cassandrab) Teradatac) Hived) S3
    • DSE Platform Sting ChukwaAegisthus S3
    • S3
    • S3
    • 99.999999999%
    • S3
    • High SLAS3 Query
    • HDFS ?
    • “Data Science as a Service”• Execution Service / Genie• Event Service• Metadata Service
    • High SLA Cluster Job High SLA S3 Query Cluster Job Query
    • High SLA S3Query Cluster Job Query
    • High SLA Cluster Job High SLA S3 Query Cluster Job Query
    • Super SLA Cluster Job Super SLA S3High SLA Cluster Job High SLA Query Cluster Job Query
    • Super SLA Cluster JobHigh SLA Cluster Job High SLA S3 Query Cluster Job Query
    • Questions? http://jobs.netflix.comkurtbrown@netflix.com