Your SlideShare is downloading. ×
Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Netflix: Amazon S3 & Amazon Elastic MapReduce to Monitor at Gigascale (BDT302) | AWS re:Invent 2013

737
views

Published on

How does Netflix stay on top of the operations of its Internet service with millions of users and billions of metrics? With Atlas, its own massively distributed, large-scale monitoring system. Come …

How does Netflix stay on top of the operations of its Internet service with millions of users and billions of metrics? With Atlas, its own massively distributed, large-scale monitoring system. Come learn how Netflix built Atlas with multiple processing pipelines using Amazon S3 and Amazon EMR to provide low-latency access to billions of metrics while supporting query-time aggregation along multiple dimensions.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
737
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Deft Data at Netflix: Using Amazon S3 and Amazon Elastic Roy Rapoport November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Friday, November 15, 13
  • 2. A Word About Me … Friday, November 15, 13
  • 3. A Word About Me … • About 20 years in technology Friday, November 15, 13
  • 4. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management Friday, November 15, 13
  • 5. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days Friday, November 15, 13
  • 6. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) Friday, November 15, 13
  • 7. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] Friday, November 15, 13
  • 8. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Cloud Monitoring Friday, November 15, 13
  • 9. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Cloud Monitoring •We build platforms Friday, November 15, 13
  • 10. A Word About Me … • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1599 days (4y:4m:15d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Cloud Monitoring •We build platforms •Sometimes we make them easy to use Friday, November 15, 13
  • 11. A Word About Netflix … Friday, November 15, 13
  • 12. A Word About Netflix … Just the Stats Friday, November 15, 13
  • 13. A Word About Netflix … Just the Stats • 16 years Friday, November 15, 13
  • 14. A Word About Netflix … Just the Stats • 16 years • 2000+ employees Friday, November 15, 13
  • 15. A Word About Netflix … Just the Stats • 16 years • 2000+ employees • 40 million users Friday, November 15, 13
  • 16. A Word About Netflix … Just the Stats • 16 years • 2000+ employees • 40 million users • 5x10^9 hours/quarter Friday, November 15, 13
  • 17. A Word About Netflix … Friday, November 15, 13
  • 18. A Word About Netflix … Freedom and Responsibility Culture Friday, November 15, 13
  • 19. A Word About Netflix … Freedom and Responsibility Culture • Optimize speed of innovation Constrain availability Cost will be what cost will be Friday, November 15, 13
  • 20. A Word About Netflix … Freedom and Responsibility Culture • Optimize speed of innovation Constrain availability Cost will be what cost will be • Hire smart (experienced) people Get out of their way Friday, November 15, 13
  • 21. A Word About Netflix … Freedom and Responsibility Culture • Optimize speed of innovation Constrain availability Cost will be what cost will be • Hire smart (experienced) people Get out of their way • Anti-process bias Friday, November 15, 13
  • 22. A Word About Netflix … Friday, November 15, 13
  • 23. A Word About Netflix … Technology and Operations Friday, November 15, 13
  • 24. A Word About Netflix … Technology and Operations • Service Oriented Architecture Friday, November 15, 13
  • 25. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You Friday, November 15, 13
  • 26. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build Friday, November 15, 13
  • 27. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test Friday, November 15, 13
  • 28. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test • Deploy Friday, November 15, 13
  • 29. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test • Deploy • Set up alerting and monitoring Friday, November 15, 13
  • 30. A Word About Netflix … Technology and Operations • Service Oriented Architecture • Decentralized Operations. You • Build • Test • Deploy • Set up alerting and monitoring • Wake up at 2AM Friday, November 15, 13
  • 31. A Word About Netflix … Technology and Operations Friday, November 15, 13
  • 32. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* Friday, November 15, 13
  • 33. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion Friday, November 15, 13
  • 34. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion • Customer Growth Friday, November 15, 13
  • 35. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion • Customer Growth • New markets Friday, November 15, 13
  • 36. A Word About Netflix … Technology and Operations • AWS-based for 100% of streaming* • Huge expansion • Customer Growth • New markets • Metrics Friday, November 15, 13
  • 37. In the Old Days … Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 38. In the Old Days … Our Old Alerting System Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 39. In the Old Days … Our Old Alerting System • Enterprise IT Solution Copyright USAID Microlinks. CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 40. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People Copyright USAID Microlinks. CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 41. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets Copyright: http://www.flickr.com/photos/s_w_ellis CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 42. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 43. In the Old Days … Our Old Alerting System • Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC • Completely separate from telemetry system Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 44. In the Old Days … In the Old Days … Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 45. In the Old Days … In the Old Days … Our Old Telemetry System Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 46. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 47. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 48. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 49. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 50. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage • Mostly Perl Copyright: http://www.flickr.com/photos/acme CC Attribution 2.0 License Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 51. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage • Mostly Perl • Datacenter-bound (and limited) Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 52. In the Old Days … In the Old Days … Our Old Telemetry System • Spare-time effort by a lone sysadmin • Loved by developers • Custom TCP protocol • RRD file back-end storage • Mostly Perl • Datacenter-bound (and limited) • Starting to falter under metrics growth Copyright: State Library of Victoria Collections. CC Attribution 2.0 License Friday, November 15, 13
  • 53. Speaking of Growth Friday, November 15, 13
  • 54. Speaking of Growth Friday, November 15, 13
  • 55. Speaking of Growth By way of comparison Friday, November 15, 13
  • 56. Speaking of Growth By way of comparison • Every person in the world • twice Friday, November 15, 13
  • 57. Speaking of Growth By way of comparison • Every person in the world • twice • Every smartphone in the world • ten times Friday, November 15, 13
  • 58. So We Built Something Better Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  • 59. So We Built Something Better UI Layer Fronts Multiple Systems UI Atlas Epic Cloud Watch Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  • 60. So We Built Something Better Clear Regional Separation • And aggregation U A E C global us-east-1 us-west-1 us-west-2 eu-west-1 Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  • 61. So We Built Something Better U A E C Localized Node/Metric Identification Before: Now: gl us us us e Here’s a metric! I think You’re Bob I’m Bob. Here’s a metric! OK! Copyright: http://www.flickr.com/photos/76651030@N02/ CC Attribution 2.0 License Friday, November 15, 13
  • 62. So We Built Something Better U A E C gl us us us e Friday, November 15, 13
  • 63. So We Built Something Better U A E C What’s a Metric? Friday, November 15, 13 gl us us us e
  • 64. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US Friday, November 15, 13 gl us us us e
  • 65. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! Friday, November 15, 13 gl us us us e
  • 66. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: Friday, November 15, 13 gl us us us e
  • 67. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami Friday, November 15, 13 ami-aa5166ef gl us us us e
  • 68. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app Friday, November 15, 13 ami-aa5166ef wp gl us us us e
  • 69. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch Friday, November 15, 13 gl us us us e
  • 70. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 Friday, November 15, 13 gl us us us e
  • 71. So We Built Something Better U A E C What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 gl us us us e
  • 72. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 nf.node i-097c0e52
  • 73. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.cluster wp-batch nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 nf.node nf.region i-097c0e52 us-west-1
  • 74. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  • 75. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class nccp nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  • 76. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  • 77. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 uiversion UI_169_mid
  • 78. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 nf.cluster wp-batch nf.zone us-west-1b class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13 uiversion action UI_169_mid authorization
  • 79. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 uiversion action UI_169_mid authorization nf.cluster wp-batch nf.zone us-west-1b devtype 101 class type nccp request nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  • 80. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 uiversion action UI_169_mid authorization nf.cluster wp-batch nf.zone us-west-1b devtype 101 class type nccp request clver PHL_0AB nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  • 81. So We Built Something Better U A E C gl us us us e What’s a Metric? • com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US • 256 characters aren’t enough! • This is better: nf.ami nf.app ami-aa5166ef wp nf.node nf.region i-097c0e52 us-west-1 uiversion action UI_169_mid authorization nf.cluster wp-batch nf.zone us-west-1b devtype 101 class type nccp request clver geo PHL_0AB us nf.asg wp-batch-v163 nf.country us Friday, November 15, 13
  • 82. So We Built Something Better U A E C gl us us us e Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  • 83. So We Built Something Better U A E C gl us us us e Powerful queries Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  • 84. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  • 85. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Copyright: Kurt Moerman CC Attribution 2.0 License Friday, November 15, 13
  • 86. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Friday, November 15, 13
  • 87. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard http://atlas/api/v1/graph? q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum &e=now-5m&s=e-3h Friday, November 15, 13
  • 88. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Friday, November 15, 13
  • 89. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard http://atlas/api/v1/graph? q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum,(,nf.zone,),:by &e=now-5m&s=e-3h Friday, November 15, 13
  • 90. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard Friday, November 15, 13
  • 91. So We Built Something Better U A E C gl us us us e Powerful queries • Make the complex possible • Make the simple … sort of hard http://atlas/api/v1/graph?q=sps,nf.cluster,(,nccp-legacy, nccp-modern,),:in,nccprt,(,NCCPLicense, com_netflix_streaming_nccp_request_license,),:in,:and,stat, SuccessfulRequests,:eq,:and,device.rollup,3ds,:eq,:and,:sum,:set,entering_trough,sps,:get,1h,:offset,0.95,:mul,sps,:get,:gt,:set,smoothed,sps,:get, 10,0.1,0.02,:des,:set,low_volume,smoothed,:get,-0.005,:mul,0.1,:add,:set,mid_volume,smoothed,:get,-0.00125,:mul,0.1,:add,:set,base,0.06,:set,min_pct, 1,smoothed,:get,20,:lt,low_volume,:get,:mul,smoothed,:get,80,:lt,mid_volume,:get,:mul,:add,entering_trough,:get,0.05,:mul,:add,base,:get,:add,:sub, 10,0.1,0.02,:des,:set,sps,:get,$(device.rollup)SPS,:legend,min_pct,:get,smoothed,:get,:mul,lowerbound,:legend,sps,:get,min_pct,:get,smoothed,:get,:mul,:lt, 5,:rolling-count,2,:ge,:vspan,60,:alpha,$(device.rollup),:legend Friday, November 15, 13
  • 92. So We Built Something Better U A E C gl us us us e Friday, November 15, 13
  • 93. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards Friday, November 15, 13 gl us us us e
  • 94. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting Friday, November 15, 13 gl us us us e
  • 95. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries Friday, November 15, 13 gl us us us e
  • 96. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics Friday, November 15, 13 gl us us us e
  • 97. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects Friday, November 15, 13 gl us us us e
  • 98. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  • 99. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  • 100. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  • 101. So We Built Something Better U A E C Ridiculous Read Volume: • Engage • Graphs and Dashboards • Alerting • Automated Canaries • Capacity Analytics • Special Projects • BI Friday, November 15, 13 gl us us us e
  • 102. So We Built Something Better global endpoint U A E C gl us us us e backend backend instance backend instance backend instance backend instance backend instance backend instance instance Friday, November 15, 13 regional endpoint
  • 103. So We Built Something Better global endpoint U A E C gl us us us e client instance Friday, November 15, 13 backend backend instance backend instance backend instance backend instance backend instance backend instance instance regional endpoint
  • 104. So We Built Something Better global endpoint U A E C gl us us us e client instance Friday, November 15, 13 publish cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance regional endpoint
  • 105. So We Built Something Better global endpoint U A E C gl us us us e client instance publish cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 regional endpoint
  • 106. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  • 107. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  • 108. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m m cluster backend backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  • 109. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m m cluster backend m backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  • 110. So We Built Something Better global endpoint gl us us us e poller cluster client instance publish m m cluster backend m backend instance backend instance backend instance backend instance backend instance backend instance instance Amazon S3 Friday, November 15, 13 U A E C regional endpoint
  • 111. That Sounds Great! Friday, November 15, 13
  • 112. That Sounds Great! Surely there are no problems Copyright: http://www.flickr.com/photos/lainetrees/ CC Attribution 2.0 License Friday, November 15, 13
  • 113. That Sounds Great! Surely there are no problems •Speed is hard Friday, November 15, 13
  • 114. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder Friday, November 15, 13
  • 115. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks Friday, November 15, 13
  • 116. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go Friday, November 15, 13
  • 117. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge Friday, November 15, 13
  • 118. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data Friday, November 15, 13
  • 119. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast Friday, November 15, 13
  • 120. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast •Operations have short memories Friday, November 15, 13
  • 121. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast •Operations have short memories Friday, November 15, 13 20,160 m2.4xlarge $32,094,720 upfront $8,005,939/month per region with no redundancy
  • 122. That Sounds Great! Surely there are no problems •Speed is hard •Speed at volume is harder •We looked at spinning disks •Memory’s the way to go •m2.4xlarge •This is operational data •People want it available, fast •Operations have short memories Friday, November 15, 13 Copyright: http://www.flickr.com/photos/amenk/ CC Attribution 2.0 License
  • 123. That Doesn’t Sound Great! Friday, November 15, 13
  • 124. That Doesn’t Sound Great! •If only we could reduce it … Friday, November 15, 13
  • 125. That Doesn’t Sound Great! •If only we could reduce it … •“Reduce”? Get it? Get it? Friday, November 15, 13
  • 126. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  • 127. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  • 128. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  • 129. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  • 130. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix •nf.node Dimensionality (tags) That Doesn’t Sound Great! Step size (time) Friday, November 15, 13
  • 131. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix •nf.node •Sometimes a lot (vhs) Friday, November 15, 13 Dimensionality (tags) That Doesn’t Sound Great! Step size (time)
  • 132. •If only we could reduce it … •“Reduce”? Get it? Get it? •Our granularity is two-dimensional •We can reduce on either dimension •Some tags make sense for very rapid reduction •Hystrix •nf.node •Sometimes a lot (vhs) •Sometimes a little (Cassandra) Friday, November 15, 13 Dimensionality (tags) That Doesn’t Sound Great! Step size (time)
  • 133. A Reductive Approach Friday, November 15, 13
  • 134. A Reductive Approach •For a series of values, reduce and keep: Friday, November 15, 13
  • 135. A Reductive Approach •For a series of values, reduce and keep: •minimum Friday, November 15, 13
  • 136. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum Friday, November 15, 13
  • 137. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total Friday, November 15, 13
  • 138. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count Friday, November 15, 13
  • 139. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: Friday, November 15, 13
  • 140. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: •3,5,9,14,20: min 3, max 20, tot 51, count 5 Friday, November 15, 13
  • 141. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: •3,5,9,14,20: min 3, max 20, tot 51, count 5 •Allows for sense of scale Friday, November 15, 13
  • 142. A Reductive Approach •For a series of values, reduce and keep: •minimum •maximum •total •count •Example: •3,5,9,14,20: min 3, max 20, tot 51, count 5 •Allows for sense of scale •Allows for arbitrary further reduction w/o loss of precision Friday, November 15, 13
  • 143. Reduction: Policy Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 144. Reduction: Policy •Policy-driven EMR engine Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 145. Reduction: Policy •Policy-driven EMR engine •Four possible actions Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 146. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 147. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve •drop Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 148. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve •drop •consolidate Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 149. Reduction: Policy •Policy-driven EMR engine •Four possible actions •preserve •drop •consolidate •rollup Copyright: http://www.flickr.com/photos/bagaball/ CC Attribution 2.0 License Friday, November 15, 13
  • 150. Reduction: Policy { "rules" : [ { "operations" : [{"op" : "drop"}], "query" : "nf.app,api,:eq,class, (,LastMinuteFailRatio,SLA,NetflixSimpleDBService,),:in,:and" }, { "operations" : [{ “config" : { "keys" : [ "nf.node", "device", "nf.country" ] }, "op" : “rollup" }], "query" : ":true" } ] } Friday, November 15, 13
  • 151. global endpoint poller cluster qu e ry client instance re o sp n se Amazon EMR m et ric regional endpoint s 2 metrics 3 metrics publish cluster 6H cluster 4D cluster EMR Driver 1 4 Amazon S3 Friday, November 15, 13 5 5 5 18D cluster Historical cluster
  • 152. global endpoint poller cluster qu e ry client instance re o sp n se Amazon EMR m et ric regional endpoint s 2 metrics 3 metrics publish cluster 6H cluster 4D cluster EMR Driver 1 4 Amazon S3 Friday, November 15, 13 5 5 5 18D cluster Historical cluster
  • 153. global endpoint poller cluster qu e ry client instance re o sp n se Amazon EMR m et ric regional endpoint s 2 metrics 3 metrics publish cluster 6H cluster 4D cluster EMR Driver 1 4 Amazon S3 Friday, November 15, 13 5 5 5 18D cluster as-needed cluster as-needed cluster as-needed cluster Historical cluster
  • 154. Reduction: Benefits Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 155. Reduction: Benefits •Indefinite storage in Amazon S3 Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 156. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 157. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 158. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 159. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 160. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* •Not in critical path for visibility SLA Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 161. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* •Not in critical path for visibility SLA •Firewalls accidental metric explosions Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 162. Reduction: Benefits •Indefinite storage in Amazon S3 •Fear of commitment achievement: Unlocked •Can be aggressive about hiding metrics •High granularity for special days •Automated for regular operations* •Not in critical path for visibility SLA •Firewalls accidental metric explosions •Huge efficiency gains Copyright: http://www.flickr.com/photos/dr_pete/ CC Attribution 2.0 License Friday, November 15, 13
  • 163. Reduction: Efficiency Copyright: http://www.flickr.com/photos/sebrenner/ CC Attribution 2.0 License Friday, November 15, 13
  • 164. Reduction: Efficiency Friday, November 15, 13
  • 165. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  • 166. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  • 167. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  • 168. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  • 169. Reduction: Efficiency 6H 4D 18D HISTORY 6 Hours 4 Days 18 Days 3 Months Size 600 512 180 12 Instances Per Hour 100 5 0 0 % Reduction 0 95 100 100 Time Horizon Friday, November 15, 13
  • 170. Previews Friday, November 15, 13
  • 171. Previews Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  • 172. Previews •Self-service for special requests Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  • 173. Previews •Self-service for special requests •Different instance types Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  • 174. Previews •Self-service for special requests •Different instance types •cr1.8xlarge Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  • 175. Previews •Self-service for special requests •Different instance types •cr1.8xlarge •hi1.4xlarge Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  • 176. Previews •Self-service for special requests •Different instance types •cr1.8xlarge •hi1.4xlarge •Multi-tiered metric visibility Copyright: http://www.flickr.com/photos/creativealan/ CC Attribution 2.0 License Friday, November 15, 13
  • 177. Growth Redux Friday, November 15, 13
  • 178. (M) metrics Growth Redux 2 2.5 10 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  • 179. (M) metrics Growth Redux 2 2.5 10 15 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  • 180. (M) metrics Growth Redux 728 212 2 2.5 10 15 18 30 55 90 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  • 181. Growth Redux (M) metrics 1,200 728 212 2 2.5 10 15 18 30 55 90 5/11 8/11 9/11 1/12 4/12 8/12 11/12 1/13 5/13 10/13 1/14 Friday, November 15, 13
  • 182. Growth Redux Friday, November 15, 13
  • 183. And a Last Word About Costs Friday, November 15, 13
  • 184. And a Last Word About Costs Friday, November 15, 13
  • 185. And a Last Word About Costs •Priorities Reminder Friday, November 15, 13
  • 186. And a Last Word About Costs •Priorities Reminder •Speed of Innovation Friday, November 15, 13
  • 187. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability Friday, November 15, 13
  • 188. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost Friday, November 15, 13
  • 189. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs Friday, November 15, 13
  • 190. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration Friday, November 15, 13
  • 191. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration •Additional features Friday, November 15, 13
  • 192. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration •Additional features •Massive Performance Friday, November 15, 13
  • 193. And a Last Word About Costs •Priorities Reminder •Speed of Innovation •Availability •Cost •Never intended to lower costs •Cloud migration •Additional features •Massive Performance Friday, November 15, 13
  • 194. EMR FTW Friday, November 15, 13
  • 195. Friday, November 15, 13
  • 196. Please give us your feedback on this presentation BDT302 As a thank you, we will select prize winners daily for completed surveys! Friday, November 15, 13 Thank You

×