BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

•

5 likes•3,091 views

In this talk, we dive into the Netflix Data Science & Engineering architecture. Not just the what, but also the why. Some key topics include the big data technologies we leverage (Cassandra, Hadoop, Pig + Python, and Hive), our use of Amazon S3 as our central data hub, our use of multiple persistent Amazon Elastic MapReduce (EMR) clusters, how we leverage the elasticity of AWS, our data science as a service approach, how we make our hybrid AWS / data center setup work well, and more.

What is Netflix’s data warehouse?
a) Cassandra
b) Teradata
c) Hive
d) S3

“Data Science as a Service”

• Execution Service / Genie

• Event Service

• Metadata Service

High SLA Cluster Job

High SLA

S3
Query Cluster Job

Query

Super SLA Cluster Job

Super SLA

S3
High SLA Cluster Job

High SLA

Query Cluster Job

Query

Super SLA Cluster Job

High SLA Cluster Job
High SLA

S3
Query Cluster Job

Query

Questions?

http://jobs.netflix.com
kurtbrown@netflix.com

What's hot

AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services

AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)Amazon Web Services

(BDT210) Building Scalable Big Data Solutions: Intel & AOLAmazon Web Services

Scaling Traffic from 0 to 139 Million Unique VisitorsYelp Engineering

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Databricks

Analytics at Scale with Apache Spark on AWS with Jonathan FritzDatabricks

Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly

Amazon EMR Facebook Presto Meetupstevemcpherson

Building a unified data pipeline in Apache SparkDataWorks Summit

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark Summit

Querying Data Pipeline with AWS AthenaYaroslav Tkachenko

Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services

(BDT320) New! Streaming Data Flows with Amazon Kinesis FirehoseAmazon Web Services

Achieve big data analytic platform with lambda architecture on cloudScott Miao

Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services

Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks

Kafka spark cassandra webinar feb 16 2016 Hiromitsu Komatsu

Big data with amazon EMR - Pop-up Loft Tel AvivAmazon Web Services

What's hot (20)

AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)

AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)

(BDT210) Building Scalable Big Data Solutions: Intel & AOL

Scaling Traffic from 0 to 139 Million Unique Visitors

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...

Analytics at Scale with Apache Spark on AWS with Jonathan Fritz

Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014

Amazon EMR Facebook Presto Meetup

Building a unified data pipeline in Apache Spark

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)

Querying Data Pipeline with AWS Athena

Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...

(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose

Achieve big data analytic platform with lambda architecture on cloud

Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS

Interactively Querying Large-scale Datasets on Amazon S3

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...

Kafka spark cassandra webinar feb 16 2016

Big data with amazon EMR - Pop-up Loft Tel Aviv

Viewers also liked

RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...Amazon Web Services

ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012Amazon Web Services

AWS Summit Auckland 2014 | Desktops in the Cloud Amazon Web Services

CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012Amazon Web Services

CPN301 The Best Amazon EC2 Features You Never Knew About - AWS re: Invent 2012Amazon Web Services

AWS Partner Presentation-Sonian-AWS Cloud Storage for the Enterprise 2012Amazon Web Services

ARC201 AWS Database Tier Architecture Best Practices - AWS re: Invent 2012Amazon Web Services

Jz 101 tAmazon Web Services

AWS 101 Lunch and Learn Jan 2013Amazon Web Services

AWS Customer Presentation – What's Up Interactive – AWS Cloud Storage for the...Amazon Web Services

Benchmarking and Performance on AWS - AWS India Summit 2012Amazon Web Services

CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …Amazon Web Services

Extending your Storage Infrastructure into the AWS CloudAmazon Web Services

DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012Amazon Web Services

ENT101 Embracing the Cloud - AWS re: Invent 2012Amazon Web Services

AWS Enabling the Startup Ecosystem - AWS India Summit 2012Amazon Web Services

re:Invent 2012 Optimizing CassandraRuslan Meshenberg

Data Science with Elastic MapReduce (EMR) at NetflixKurt Brown

In Depth: AWS Shared Security ModelAmazon Web Services

A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal

Viewers also liked (20)

RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...

ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012

AWS Summit Auckland 2014 | Desktops in the Cloud

CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012

CPN301 The Best Amazon EC2 Features You Never Knew About - AWS re: Invent 2012

AWS Partner Presentation-Sonian-AWS Cloud Storage for the Enterprise 2012

ARC201 AWS Database Tier Architecture Best Practices - AWS re: Invent 2012

Jz 101 t

AWS 101 Lunch and Learn Jan 2013

AWS Customer Presentation – What's Up Interactive – AWS Cloud Storage for the...

Benchmarking and Performance on AWS - AWS India Summit 2012

CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …

Extending your Storage Infrastructure into the AWS Cloud

DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012

ENT101 Embracing the Cloud - AWS re: Invent 2012

AWS Enabling the Startup Ecosystem - AWS India Summit 2012

re:Invent 2012 Optimizing Cassandra

Data Science with Elastic MapReduce (EMR) at Netflix

In Depth: AWS Shared Security Model

A Tool for Practical Garbage Collection Analysis In the Cloud

Similar to BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

Deep Dive on Amazon S3Adrian Hornsby

2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 Amazon Web Services Korea

Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleBishop Fox

AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...Amazon Web Services

Running Presto and Spark on the Netflix Big Data PlatformEva Tse

AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...Amazon Web Services

Clojure at BackTypenathanmarz

Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...Amazon Web Services

Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share

AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services

Log Analysis At ScaleAmazon Web Services

Escalando hasta sus primeros 10 millones de usuariosAmazon Web Services LATAM

ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...Vyacheslav Lapin

Querying and Analyzing Data in Amazon S3Amazon Web Services

Scaling on AWS for the First 10 Million UsersAmazon Web Services

Escalando hasta sus primeros 10 millones de usuariosAmazon Web Services LATAM

Builders' Day - Best Practises for S3 - BLAmazon Web Services LATAM

Deep Dive on Object Storage: Amazon S3 and Amazon GlacierAdrian Hornsby

Deep Dive on Object Storage: Amazon S3 and Amazon GlacierAmazon Web Services

S3 Introfool2nd

Similar to BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012 (20)

Deep Dive on Amazon S3

2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개

Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale

AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...

Running Presto and Spark on the Netflix Big Data Platform

AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...

Clojure at BackType

Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...

Cassandra Data Modeling - Practical Considerations @ Netflix

AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...

Log Analysis At Scale

Escalando hasta sus primeros 10 millones de usuarios

ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...

Querying and Analyzing Data in Amazon S3

Scaling on AWS for the First 10 Million Users

Escalando hasta sus primeros 10 millones de usuarios

Builders' Day - Best Practises for S3 - BL

Deep Dive on Object Storage: Amazon S3 and Amazon Glacier

S3 Intro

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

Similar to BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012