Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×
 

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

on

  • 1,142 views

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & ...

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

Statistics

Views

Total Views
1,142
Views on SlideShare
1,142
Embed Views
0

Actions

Likes
0
Downloads
33
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013 Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013 Presentation Transcript

  • Data Science at Netflix with Amazon EMR Kurt Brown, Director of Data Platform, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Data Platform
  • Data Platform Suro S3
  • Aegisthus
  • Data Platform Suro Aegisthus S3
  • Sting
  • Data Platform Sting Suro Aegisthus S3
  • Data Platform Sting Suro Aegisthus S3
  • Data Platform Sting Suro Aegisthus S3
  • S3
  • S3
  • 99.999999999%
  • S3
  • S3 High SLA Query
  • HDFS ?
  • Eventual Consistency
  • S3mper
  • “Data as a Service” • Execution Service • Event Service • Metadata Service
  • High SLA Cluster Job High SLA Query Cluster Job Query S3
  • High SLA Query Cluster Job Query S3
  • High SLA Cluster Job High SLA Query Cluster Job Query S3
  • Bonus Cluster Job Bonus High SLA Cluster Job High SLA Query Cluster Job Query S3
  • Bonus Cluster Job High SLA Cluster Job High SLA Query Cluster Job Query S3
  • Tez
  • Suro
  • Aegisthus
  • Questions? kurtbrown@netflix.com http://jobs.netflix.com
  • Please give us your feedback on this presentation BDT306 As a thank you, we will select prize winners daily for completed surveys!