0
Deft Data at Netflix:
Using Amazon S3 and Amazon Elastic
Roy Rapoport
November 14, 2013

© 2013 Amazon.com, Inc. and its a...
A Word About Me …

Friday, November 15, 13
A Word About Me …
• About 20 years in technology

Friday, November 15, 13
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
mana...
A Word About Netflix …

Friday, November 15, 13
A Word About Netflix …
Just the Stats

Friday, November 15, 13
A Word About Netflix …
Just the Stats

• 16 years

Friday, November 15, 13
A Word About Netflix …
Just the Stats

• 16 years
• 2000+ employees

Friday, November 15, 13
A Word About Netflix …
Just the Stats

• 16 years
• 2000+ employees
• 40 million users

Friday, November 15, 13
A Word About Netflix …
Just the Stats

• 16 years
• 2000+ employees
• 40 million users
• 5x10^9 hours/quarter

Friday, Nov...
A Word About Netflix …

Friday, November 15, 13
A Word About Netflix …
Freedom and Responsibility Culture

Friday, November 15, 13
A Word About Netflix …
Freedom and Responsibility Culture

• Optimize speed of innovation
Constrain availability
Cost will...
A Word About Netflix …
Freedom and Responsibility Culture

• Optimize speed of innovation
Constrain availability
Cost will...
A Word About Netflix …
Freedom and Responsibility Culture

• Optimize speed of innovation
Constrain availability
Cost will...
A Word About Netflix …

Friday, November 15, 13
A Word About Netflix …
Technology and Operations

Friday, November 15, 13
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture

Friday, November 15, 13
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture
• Decentralized Operations. You

Friday,...
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture
• Decentralized Operations. You
• Build
...
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture
• Decentralized Operations. You
• Build
...
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture
• Decentralized Operations. You
• Build
...
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture
• Decentralized Operations. You
• Build
...
A Word About Netflix …
Technology and Operations

• Service Oriented Architecture
• Decentralized Operations. You
• Build
...
A Word About Netflix …
Technology and Operations

Friday, November 15, 13
A Word About Netflix …
Technology and Operations

• AWS-based for 100% of streaming*

Friday, November 15, 13
A Word About Netflix …
Technology and Operations

• AWS-based for 100% of streaming*
• Huge expansion

Friday, November 15...
A Word About Netflix …
Technology and Operations

• AWS-based for 100% of streaming*
• Huge expansion
• Customer Growth

F...
A Word About Netflix …
Technology and Operations

• AWS-based for 100% of streaming*
• Huge expansion
• Customer Growth
• ...
A Word About Netflix …
Technology and Operations

• AWS-based for 100% of streaming*
• Huge expansion
• Customer Growth
• ...
In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
In the Old Days …
Our Old Alerting System

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Fr...
In the Old Days …
Our Old Alerting System

• Enterprise IT Solution

Copyright USAID Microlinks. CC Attribution 2.0 Licens...
In the Old Days …
Our Old Alerting System

• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People

Copyri...
In the Old Days …
Our Old Alerting System

• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File ...
In the Old Days …
Our Old Alerting System

• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File ...
In the Old Days …
Our Old Alerting System

• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File ...
In the Old Days …
In the Old Days …

Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, ...
In the Old Days …
In the Old Days …

Our Old Telemetry System

Copyright: State Library of Victoria Collections. CC Attrib...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin

Copyright: State Li...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin
• Loved by developer...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin
• Loved by developer...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin
• Loved by developer...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin
• Loved by developer...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin
• Loved by developer...
In the Old Days …
In the Old Days …

Our Old Telemetry System

• Spare-time effort by a lone sysadmin
• Loved by developer...
Speaking of Growth

Friday, November 15, 13
Speaking of Growth

Friday, November 15, 13
Speaking of Growth
By way of comparison

Friday, November 15, 13
Speaking of Growth
By way of comparison
• Every person in the world
• twice

Friday, November 15, 13
Speaking of Growth
By way of comparison
• Every person in the world
• twice
• Every smartphone in the
world
• ten times

F...
So We Built Something Better

Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, Nov...
So We Built Something Better
UI Layer Fronts Multiple Systems

UI
Atlas

Epic

Cloud
Watch
Copyright: http://www.flickr.co...
So We Built Something Better
Clear Regional Separation
• And aggregation

U

A E C

global

us-east-1 us-west-1 us-west-2 ...
So We Built Something Better

U

A E C

Localized Node/Metric Identification
Before:

Now:

gl
us us us e
Here’s a
metric!...
So We Built Something Better

U

A E C
gl
us us us e

Friday, November 15, 13
So We Built Something Better

U

A E C
What’s a Metric?

Friday, November 15, 13

gl
us us us e
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-author...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversio...
So We Built Something Better

U

A E C
gl
us us us e

Copyright: Kurt Moerman
CC Attribution 2.0 License

Friday, November...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries

Copyright: Kurt Moerman
CC Attribution 2.0 License
...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible

Copyright: Kurt Moerman...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort...
So We Built Something Better

U

A E C
gl
us us us e

Friday, November 15, 13
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards

Friday, November 15, 13

...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting

Friday, Novemb...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better

U

A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Can...
So We Built Something Better
global
endpoint

U

A E C
gl
us us us e

backend
backend
instance
backend
instance
backend
in...
So We Built Something Better
global
endpoint

U

A E C
gl
us us us e

client
instance

Friday, November 15, 13

backend
ba...
So We Built Something Better
global
endpoint

U

A E C
gl
us us us e

client
instance

Friday, November 15, 13

publish
cl...
So We Built Something Better
global
endpoint

U

A E C
gl
us us us e

client
instance

publish
cluster

backend
backend
in...
So We Built Something Better
global
endpoint

gl
us us us e

poller
cluster

client
instance

publish
cluster

backend
bac...
So We Built Something Better
global
endpoint

gl
us us us e

poller
cluster

client
instance

publish
m
cluster

backend
b...
So We Built Something Better
global
endpoint

gl
us us us e

poller
cluster

client
instance

publish
m
m
cluster

backend...
So We Built Something Better
global
endpoint

gl
us us us e

poller
cluster

client
instance

publish
m
m
cluster

backend...
So We Built Something Better
global
endpoint

gl
us us us e

poller
cluster

client
instance

publish
m
m
cluster

backend...
That Sounds Great!

Friday, November 15, 13
That Sounds Great!
Surely there are no problems

Copyright: http://www.flickr.com/photos/lainetrees/
CC Attribution 2.0 Li...
That Sounds Great!
Surely there are no problems

•Speed is hard

Friday, November 15, 13
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder

Friday, November 15, 13
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks

F...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Sounds Great!
Surely there are no problems

•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•M...
That Doesn’t Sound Great!

Friday, November 15, 13
That Doesn’t Sound Great!
•If only we could reduce it …

Friday, November 15, 13
That Doesn’t Sound Great!
•If only we could reduce it …
•“Reduce”? Get it? Get it?

Friday, November 15, 13
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional

Dimensionality (tags)

That ...
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dime...
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dime...
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dime...
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dime...
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dime...
•If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dime...
A Reductive Approach

Friday, November 15, 13
A Reductive Approach
•For a series of values, reduce and keep:

Friday, November 15, 13
A Reductive Approach
•For a series of values, reduce and keep:
•minimum

Friday, November 15, 13
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum

Friday, November 15, 13
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total

Friday, November 15, 13
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count

Friday, November 15, 13
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:

Friday, November...
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
•3,5,9,14,20: min...
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
•3,5,9,14,20: min...
A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
•3,5,9,14,20: min...
Reduction: Policy

Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License

Friday, November 15, 13
Reduction: Policy
•Policy-driven EMR engine

Copyright: http://www.flickr.com/photos/bagaball/
CC Attribution 2.0 License
...
Reduction: Policy
•Policy-driven EMR engine
•Four possible actions

Copyright: http://www.flickr.com/photos/bagaball/
CC A...
Reduction: Policy
•Policy-driven EMR engine
•Four possible actions
•preserve

Copyright: http://www.flickr.com/photos/baga...
Reduction: Policy
•Policy-driven EMR engine
•Four possible actions
•preserve
•drop

Copyright: http://www.flickr.com/photo...
Reduction: Policy
•Policy-driven EMR engine
•Four possible actions
•preserve
•drop
•consolidate

Copyright: http://www.fli...
Reduction: Policy
•Policy-driven EMR engine
•Four possible actions
•preserve
•drop
•consolidate
•rollup
Copyright: http://...
Reduction: Policy
{
"rules" : [
{ "operations" : [{"op" : "drop"}],
"query" : "nf.app,api,:eq,class,
(,LastMinuteFailRatio...
global
endpoint
poller
cluster
qu

e

ry

client
instance

re

o
sp

n

se

Amazon EMR
m

et
ric

regional
endpoint

s
2

...
global
endpoint
poller
cluster
qu

e

ry

client
instance

re

o
sp

n

se

Amazon EMR
m

et
ric

regional
endpoint

s
2

...
global
endpoint
poller
cluster
qu

e

ry

client
instance

re

o
sp

n

se

Amazon EMR
m

et
ric

regional
endpoint

s
2

...
Reduction: Benefits

Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License

Friday, November 15, 13
Reduction: Benefits
•Indefinite storage in Amazon S3

Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 ...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked

Copyright: http://www.flic...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hi...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hi...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hi...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hi...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hi...
Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hi...
Reduction: Efficiency

Copyright: http://www.flickr.com/photos/sebrenner/
CC Attribution 2.0 License

Friday, November 15,...
Reduction: Efficiency

Friday, November 15, 13
Reduction: Efficiency
6H

4D

18D

HISTORY

6 Hours

4 Days

18 Days

3 Months

Size

600

512

180

12

Instances Per
Hou...
Reduction: Efficiency
6H

4D

18D

HISTORY

6 Hours

4 Days

18 Days

3 Months

Size

600

512

180

12

Instances Per
Hou...
Reduction: Efficiency
6H

4D

18D

HISTORY

6 Hours

4 Days

18 Days

3 Months

Size

600

512

180

12

Instances Per
Hou...
Reduction: Efficiency
6H

4D

18D

HISTORY

6 Hours

4 Days

18 Days

3 Months

Size

600

512

180

12

Instances Per
Hou...
Reduction: Efficiency
6H

4D

18D

HISTORY

6 Hours

4 Days

18 Days

3 Months

Size

600

512

180

12

Instances Per
Hou...
Previews

Friday, November 15, 13
Previews

Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License

Friday, November 15, 13
Previews
•Self-service for special requests

Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 Lice...
Previews
•Self-service for special requests
•Different instance types

Copyright: http://www.flickr.com/photos/creativeala...
Previews
•Self-service for special requests
•Different instance types
•cr1.8xlarge

Copyright: http://www.flickr.com/photo...
Previews
•Self-service for special requests
•Different instance types
•cr1.8xlarge
•hi1.4xlarge

Copyright: http://www.fli...
Previews
•Self-service for special requests
•Different instance types
•cr1.8xlarge
•hi1.4xlarge
•Multi-tiered metric visib...
Growth Redux

Friday, November 15, 13
(M) metrics

Growth Redux

2

2.5

10

5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14
Friday, November 15, 13
(M) metrics

Growth Redux

2

2.5

10

15

5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14
Friday, November 15, 1...
(M) metrics

Growth Redux

728

212
2

2.5

10

15

18

30

55

90

5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/...
Growth Redux

(M) metrics

1,200

728

212
2

2.5

10

15

18

30

55

90

5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 1...
Growth Redux

Friday, November 15, 13
And a Last Word About Costs

Friday, November 15, 13
And a Last Word About Costs

Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder

Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation

Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability

Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost

Friday, November 15, 13
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs

...
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•...
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•...
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•...
And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•...
EMR
FTW
Friday, November 15, 13
Friday, November 15, 13
Please give us your feedback on this
presentation

BDT302
As a thank you, we will select prize
winners daily for completed...
Upcoming SlideShare
Loading in...5
×

Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

836

Published on

How does Netflix stay on top of the operations of its Internet service with millions of users and billions of metrics? With Atlas, its own massively distributed, large-scale monitoring system. Come learn how Netflix built Atlas with multiple processing pipelines using Amazon S3 and Amazon EMR to provide low-latency access to billions of metrics while supporting query-time aggregation along multiple dimensions.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
836
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
49
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013"

  1. 1. Deft Data at Netflix: Using Amazon S3 and Amazon Elastic Roy Rapoport November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Friday, November 15, 13
  2. 2. A Word About Me … Friday, November 15, 13
  3. 3. A Word About Me … • About 20 years in technology Friday, November 15, 13
  4. 4. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management Friday, November 15, 13
  5. 5. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days Friday, November 15, 13
  6. 6. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) Friday, November 15, 13
  7. 7. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] Friday, November 15, 13
  8. 8. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Cloud Monitoring Friday, November 15, 13
  9. 9. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Cloud Monitoring •We build platforms Friday, November 15, 13
  10. 10. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Cloud Monitoring •We build platforms •Sometimes we make them easy to use Friday, November 15, 13
  11. 11. A Word About Netflix … Friday, November 15, 13
  12. 12. A Word About Netflix … Just the Stats Friday, November 15, 13
  13. 13. A Word About Netflix … Just the Stats • 16 years Friday, November 15, 13
  14. 14. A Word About Netflix … Just the Stats • 16 years • 2000+ employees Friday, November 15, 13
  15. 15. A Word About Netflix … Just the Stats • 16 years • 2000+ employees • 40 million users Friday, November 15, 13
  16. 16. A Word About Netflix … Just the Stats • 16 years • 2000+ employees • 40 million users • 5x10^9 hours/quarter Friday, November 15, 13
  17. 17. A Word About Netflix … Friday, November 15, 13
  18. 18. A Word About Netflix … Freedom and Responsibility Culture Friday, November 15, 13
  19. 19. A Word About Netflix … Freedom and Responsibility Culture • Optimize speed of innovation Constrain availability Cost will be what cost will be Friday, November 15, 13
  20. 20. A Word About Netflix … Freedom and Responsibility Culture • Optimize speed of innovation Constrain availability Cost will be what cost will be • Hire smart (experienced) people Get out of their way Friday, November 15, 13
  21. 21. A Word About Netflix … Freedom and Responsibility Culture • Optimize speed of innovation Constrain availability Cost will be what cost will be • Hire smart (experienced) people Get out of their way • Anti-process bias Friday, November 15, 13
  22. 22. A Word About Netflix … Friday, November 15, 13
  23. 23. A Word About Netflix … Technology and Operations Friday, November 15, 13
  24. 24. A Word About Netflix … Technology and Operations • Service Oriented Architecture Friday, November 15, 13
  25. 25. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You Friday, November 15, 13
  26. 26. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build Friday, November 15, 13
  27. 27. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test Friday, November 15, 13
  28. 28. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test • Deploy Friday, November 15, 13
  29. 29. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test • Deploy • Set up alerting and monitoring Friday, November 15, 13
  30. 30. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test • Deploy • Set up alerting and monitoring • Wake up at 2AM Friday, November 15, 13
  31. 31. A Word About Netflix … Technology and Operations Friday, November 15, 13
  32. 32. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* Friday, November 15, 13
  33. 33. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion Friday, November 15, 13
  34. 34. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion • Customer Growth Friday, November 15, 13
  35. 35. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion • Customer Growth • New markets Friday, November 15, 13
  36. 36. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion • Customer Growth • New markets • Metrics Friday, November 15, 13
  37. 37. In the Old Days … Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  38. 38. In the Old Days … Our Old Alerting System Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  39. 39. In the Old Days … Our Old Alerting System • Enterprise IT Solution Copyright USAID Microlinks. CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  40. 40. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People Copyright USAID Microlinks. CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  41. 41. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets Copyright: http://www.flickr.com/photos/s_w_ellis CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  42. 42. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  43. 43. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC • Completely separate from telemetry system Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  44. 44. In the Old Days … In the Old Days … Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  45. 45. In the Old Days … In the Old Days … Our Old Telemetry System Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  46. 46. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  47. 47. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  48. 48. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  49. 49. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  50. 50. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage • Mostly Perl Copyright: http://www.flickr.com/photos/acme CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  51. 51. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage • Mostly Perl • Datacenter-bound (and limited) Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  52. 52. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage • Mostly Perl • Datacenter-bound (and limited) • Starting to falter under metrics growth Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  53. 53. Speaking of Growth Friday, November 15, 13
  54. 54. Speaking of Growth Friday, November 15, 13
  55. 55. Speaking of Growth By way of comparison Friday, November 15, 13
  56. 56. Speaking of Growth By way of comparison • Every person in the world • twice Friday, November 15, 13
  57. 57. Speaking of Growth By way of comparison • Every person in the world • twice • Every smartphone in the world • ten times Friday, November 15, 13
  58. 58. So We Built Something Better Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  59. 59. So We Built Something Better UI Layer Fronts Multiple Systems UI Atlas Epic Cloud Watch Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  60. 60. So We Built Something Better Clear Regional Separation • And aggregation U A E C global us-east-1 us-west-1 us-west-2 eu-west-1 Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  61. 61. So We Built Something Better U A E C Localized Node/Metric Identification Before: Now: gl us us us e Here’s a metric! I think You’re Bob I’m Bob. Here’s a metric! OK! Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  62. 62. So We Built Something Better U A E C gl us us us e Friday, November 15, 13
  63. 63. So We Built Something Better U A E C What’s a Metric? Friday, November 15, 13 gl us us us e
  64. 64. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US Friday, November 15, 13 gl us us us e
  65. 65. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! Friday, November 15, 13 gl us us us e
  66. 66. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: Friday, November 15, 13 gl us us us e
  67. 67. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami Friday, November 15, 13 ami-aa5166ef gl us us us e
  68. 68. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app Friday, November 15, 13 ami-aa5166ef wp gl us us us e
  69. 69. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch Friday, November 15, 13 gl us us us e
  70. 70. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 Friday, November 15, 13 gl us us us e
  71. 71. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 gl us us us e
  72. 72. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 nf.node i-097c0e52
  73. 73. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 nf.node nf.region i-097c0e52 us-west-1
  74. 74. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  75. 75. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class nccp nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  76. 76. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  77. 77. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 uiversion UI_169_mid
  78. 78. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 uiversion action UI_169_mid authorization
  79. 79. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 uiversion action UI_169_mid authorization nf.cluster wp-batch nf.zone us-west-1b devtype 101 class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  80. 80. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 uiversion action UI_169_mid authorization nf.cluster wp-batch nf.zone us-west-1b devtype 101 class type nccp request clver PHL_0AB nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  81. 81. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 uiversion action UI_169_mid authorization nf.cluster wp-batch nf.zone us-west-1b devtype 101 class type nccp request clver geo PHL_0AB us nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  82. 82. So We Built Something Better U A E C gl us us us e Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  83. 83. So We Built Something Better U A E C gl us us us e Powerful queries Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  84. 84. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  85. 85. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  86. 86. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Friday, November 15, 13
  87. 87. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard http://atlas/api/v1/graph? q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum &e=now-5m&s=e-3h Friday, November 15, 13
  88. 88. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Friday, November 15, 13
  89. 89. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard http://atlas/api/v1/graph? q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum,(,nf.zone,),:by &e=now-5m&s=e-3h Friday, November 15, 13
  90. 90. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Friday, November 15, 13
  91. 91. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard http://atlas/api/v1/graph?q=sps,nf.cluster,(,nccp-legacy, nccp-modern,),:in,nccprt,(,NCCPLicense, com_netflix_streaming_nccp_request_license,),:in,:and,stat, SuccessfulRequests,:eq,:and,device.rollup,3ds,:eq,:and,:sum,:set,entering_trough,sps,:get,1h,:offset,0.95,:mul,sps,:get,:gt,:set,smoothed,sps,:get, 10,0.1,0.02,:des,:set,low_volume,smoothed,:get,-0.005,:mul,0.1,:add,:set,mid_volume,smoothed,:get,-0.00125,:mul,0.1,:add,:set,base,0.06,:set,min_pct, 1,smoothed,:get,20,:lt,low_volume,:get,:mul,smoothed,:get,80,:lt,mid_volume,:get,:mul,:add,entering_trough,:get,0.05,:mul,:add,base,:get,:add,:sub, 10,0.1,0.02,:des,:set,sps,:get,$(device.rollup)SPS,:legend,min_pct,:get,smoothed,:get,:mul,lowerbound,:legend,sps,:get,min_pct,:get,smoothed,:get,:mul,:lt, 5,:rolling-count,2,:ge,:vspan,60,:alpha,$(device.rollup),:legend Friday, November 15, 13
  92. 92. So We Built Something Better U A E C gl us us us e Friday, November 15, 13
  93. 93. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards Friday, November 15, 13 gl us us us e
  94. 94. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting Friday, November 15, 13 gl us us us e
  95. 95. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries Friday, November 15, 13 gl us us us e
  96. 96. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics Friday, November 15, 13 gl us us us e
  97. 97. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects Friday, November 15, 13 gl us us us e
  98. 98. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  99. 99. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  100. 100. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  101. 101. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  102. 102. So We Built Something Better global endpoint U A E C gl us us us e backend backend instance backend instance backend instance backend instance backend instance backend instance instance Friday, November 15, 13 regional endpoint
  103. 103. So We Built Something Better global endpoint U A E C gl us us us e client instance Friday, November 15, 13 backend backend instance backend instance backend instance backend instance backend instance backend instance instance regional endpoint
  104. 104. So We Built Something Better global endpoint U A E C gl us us us e client instance Friday, November 15, 13 publish cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance regional endpoint
  105. 105. So We Built Something Better global endpoint U A E C gl us us us e client instance publish cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 regional endpoint
  106. 106. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  107. 107. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  108. 108. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m m cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  109. 109. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m m cluster backend m backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  110. 110. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m m cluster backend m backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  111. 111. That Sounds Great! Friday, November 15, 13
  112. 112. That Sounds Great! Surely there are no problems Copyright: http://www.flickr.com/photos/lainetrees/ CC Attribution 2.0 License Friday, November 15, 13
  113. 113. That Sounds Great! Surely there are no problems •Speed is hard Friday, November 15, 13
  114. 114. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder Friday, November 15, 13
  115. 115. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks Friday, November 15, 13
  116. 116. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go Friday, November 15, 13
  117. 117. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge Friday, November 15, 13
  118. 118. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data Friday, November 15, 13
  119. 119. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast Friday, November 15, 13
  120. 120. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast •Operations have short memories Friday, November 15, 13
  121. 121. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast •Operations have short memories Friday, November 15, 13 20,160 m2.4xlarge $32,094,720 upfront $8,005,939/month per region with no redundancy
  122. 122. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast •Operations have short memories Friday, November 15, 13 Copyright: http://www.flickr.com/photos/amenk/ CC Attribution 2.0 License
  123. 123. That Doesn’t Sound Great! Friday, November 15, 13
  124. 124. That Doesn’t Sound Great! •If only we could reduce it … Friday, November 15, 13
  125. 125. That Doesn’t Sound Great! •If only we could reduce it … •“Reduce”? Get it? Get it? Friday, November 15, 13
  126. 126. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  127. 127. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  128. 128. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  129. 129. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  130. 130. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix •nf.node Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  131. 131. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix •nf.node •Sometimes a lot (vhs) Friday, November 15, 13 Dimensionality (tags) That Doesn’t Sound Great! Step size (time)
  132. 132. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix •nf.node •Sometimes a lot (vhs) •Sometimes a little (Cassandra) Friday, November 15, 13 Dimensionality (tags) That Doesn’t Sound Great! Step size (time)
  133. 133. A Reductive Approach Friday, November 15, 13
  134. 134. A Reductive Approach •For a series of values, reduce and keep: Friday, November 15, 13
  135. 135. A Reductive Approach •For a series of values, reduce and keep: •minimum Friday, November 15, 13
  136. 136. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum Friday, November 15, 13
  137. 137. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total Friday, November 15, 13
  138. 138. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count Friday, November 15, 13
  139. 139. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: Friday, November 15, 13
  140. 140. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: •3,5,9,14,20: min 3, max 20, tot 51, count 5 Friday, November 15, 13
  141. 141. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: •3,5,9,14,20: min 3, max 20, tot 51, count 5 •Allows for sense of scale Friday, November 15, 13
  142. 142. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: •3,5,9,14,20: min 3, max 20, tot 51, count 5 •Allows for sense of scale •Allows for arbitrary further reduction w/o loss of precision Friday, November 15, 13
  143. 143. Reduction: Policy Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  144. 144. Reduction: Policy •Policy-driven EMR engine Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  145. 145. Reduction: Policy •Policy-driven EMR engine •Four possible actions Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  146. 146. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  147. 147. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve •drop Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  148. 148. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve •drop •consolidate Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  149. 149. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve •drop •consolidate •rollup Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  150. 150. Reduction: Policy { "rules" : [ { "operations" : [{"op" : "drop"}], "query" : "nf.app,api,:eq,class, (,LastMinuteFailRatio,SLA,NetflixSimpleDBService,),:in,:and" }, { "operations" : [{ “config" : { "keys" : [ "nf.node", "device", "nf.country" ] }, "op" : “rollup" }], "query" : ":true" } ] } Friday, November 15, 13
  151. 151. global endpoint poller cluster qu e ry client instance re o sp n se Amazon EMR m et ric regional endpoint s 2 metrics 3 metrics publish cluster 6H cluster 4D cluster EMR Driver 1 4 Amazon S3 Friday, November 15, 13 5 5 5 18D cluster Historical cluster
  152. 152. global endpoint poller cluster qu e ry client instance re o sp n se Amazon EMR m et ric regional endpoint s 2 metrics 3 metrics publish cluster 6H cluster 4D cluster EMR Driver 1 4 Amazon S3 Friday, November 15, 13 5 5 5 18D cluster Historical cluster
  153. 153. global endpoint poller cluster qu e ry client instance re o sp n se Amazon EMR m et ric regional endpoint s 2 metrics 3 metrics publish cluster 6H cluster 4D cluster EMR Driver 1 4 Amazon S3 Friday, November 15, 13 5 5 5 18D cluster as-needed cluster as-needed cluster as-needed cluster Historical cluster
  154. 154. Reduction: Benefits Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  155. 155. Reduction: Benefits •Indefinite storage in Amazon S3 Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  156. 156. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  157. 157. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  158. 158. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  159. 159. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  160. 160. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* •Not in critical path for visibility SLA Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  161. 161. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* •Not in critical path for visibility SLA •Firewalls accidental metric explosions Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  162. 162. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* •Not in critical path for visibility SLA •Firewalls accidental metric explosions •Huge efficiency gains Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  163. 163. Reduction: Efficiency Copyright: http://www.flickr.com/photos/sebrenner/ CC Attribution 2.0 License Friday, November 15, 13
  164. 164. Reduction: Efficiency Friday, November 15, 13
  165. 165. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  166. 166. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  167. 167. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  168. 168. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  169. 169. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  170. 170. Previews Friday, November 15, 13
  171. 171. Previews Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  172. 172. Previews •Self-service for special requests Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  173. 173. Previews •Self-service for special requests •Different instance types Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  174. 174. Previews •Self-service for special requests •Different instance types •cr1.8xlarge Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  175. 175. Previews •Self-service for special requests •Different instance types •cr1.8xlarge •hi1.4xlarge Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  176. 176. Previews •Self-service for special requests •Different instance types •cr1.8xlarge •hi1.4xlarge •Multi-tiered metric visibility Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  177. 177. Growth Redux Friday, November 15, 13
  178. 178. (M) metrics Growth Redux 2 2.5 10 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  179. 179. (M) metrics Growth Redux 2 2.5 10 15 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  180. 180. (M) metrics Growth Redux 728 212 2 2.5 10 15 18 30 55 90 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  181. 181. Growth Redux (M) metrics 1,200 728 212 2 2.5 10 15 18 30 55 90 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  182. 182. Growth Redux Friday, November 15, 13
  183. 183. And a Last Word About Costs Friday, November 15, 13
  184. 184. And a Last Word About Costs Friday, November 15, 13
  185. 185. And a Last Word About Costs •Priorities Reminder Friday, November 15, 13
  186. 186. And a Last Word About Costs •Priorities Reminder •Speed of Innovation Friday, November 15, 13
  187. 187. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability Friday, November 15, 13
  188. 188. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost Friday, November 15, 13
  189. 189. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs Friday, November 15, 13
  190. 190. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration Friday, November 15, 13
  191. 191. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration •Additional features Friday, November 15, 13
  192. 192. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration •Additional features •Massive Performance Friday, November 15, 13
  193. 193. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration •Additional features •Massive Performance Friday, November 15, 13
  194. 194. EMR FTW Friday, November 15, 13
  195. 195. Friday, November 15, 13
  196. 196. Please give us your feedback on this presentation BDT302 As a thank you, we will select prize winners daily for completed surveys! Friday, November 15, 13 Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×