Netflix and Open Source            March 2013          Adrian Cockcroft    @adrianco #netflixcloud @NetflixOSS http://www....
Cloud NativeNetflixOSS – Cloud Native On-RampNetflix Open Source Cloud Prize
Netflix Member Web Site Home Page    Personalization Driven – How Does It Work?
How Netflix Streaming WorksConsumerElectronics                                           User Data                        ...
Content Delivery ServiceOpen Source Hardware Design + FreeBSD, bird, nginx
November 2012 Traffic
Real Web Server Dependencies Flow         (Netflix Home page business transaction as seen by AppDynamics)Each icon isthree...
Cloud Native Architecture     Clients                      Things Autoscaled Micro      JVM         JVM         JVM     Se...
Non-Native Cloud ArchitectureAgile Mobile       iOS/Android Mammals  Cloudy                   App Servers  BufferDatacente...
New Anti-Fragile Patterns          Micro-services          Chaos enginesHighly available systems composed   from ephemeral...
Stateless Micro-Service ArchitectureLinux Base AMI (CentOS or Ubuntu)  Optional   Apache  frontend,                Java (J...
Cassandra Instance ArchitectureLinux Base AMI (CentOS or Ubuntu) Tomcat andPriam on JDK   Java (JDK 7)Healthcheck,   Statu...
Configuration State Management      Datacenter CMDB’s woeful      Cloud native is the solution         Dependably complete
Edda – Configuration History http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html                    ...
Edda Query ExamplesFind any instances that have ever had a specific public IP address$ curl "http://edda/api/v2/view/insta...
Cloud NativeMaster copies of data are cloud resident Everything is dynamically provisioned      All services are ephemeral
Scalability Demands
Asgardhttp://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Cloud Deployment Scalability       New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40sScaled up and...
Ephemeral Instances  • Largest services are autoscaled  • Average lifetime of an instance is 36 hours                     ...
Leveraging Public Scale           1,000 Instances   100,000 Instances                        GreyPublic                   ...
How big is Public?    AWS Maximum Possible Instance Count 3.7 Million      Growth >10x in Three Years, >2x Per AnnumAWS up...
Availability        Is it running yet?How many places is it running in? How far apart are those places?
Antifragile API PatternsFunctional Reactive with Circuit Breakers and Bulkheads
Outages• Running very fast with scissors  – Mostly self inflicted – bugs, mistakes  – Some caused by AWS bugs and mistakes...
Managing Multi-Region Availability                                         AWS                                            ...
Denominator                     Software Defined DNS for Java                                                    Edda, Mul...
A Cloud Native Open Source Platform
Inspiration
Three Questions Why is Netflix doing this?How does it all fit together?   What is coming next?
Beware of Geeks Bearing Gifts: Strategies for an         Increasingly Open Economy      Simon Wardley - Researcher at the ...
How did Netflix get ahead?Netflix Business + Developer Org   Traditional IT Operations•   Doing it right now             •...
Netflix Platform Evolution  2009-2010                2011-2012                      2013-2014Bleeding Edge              Co...
Making it easy to followExploring the wild west each time   vs. laying down a shared route
Establish our            Hire, Retain and  solutions as Best            Engage TopPractices / Standards           Engineer...
How does it all fit together?
NetflixOSS Continuous Build and Deployment  Github           Maven            AWS NetflixOSS        Central        Base AM...
NetflixOSS Services ScopeAWS AccountAsgard ConsoleArchaius Config                  Multiple AWS Regions    Service Cross r...
NetflixOSS Instance Libraries                 • Baked AMI – Tomcat, Apache, your codeInitialization   • Governator – Guice...
NetflixOSS Testing and Automation               • CassJmeter – Load testing for Cassandra Test Tools    • Circus Monkey – ...
Example Application – RSS Reader
What’s Coming Next?           Better portability           Higher availability MoreFeatures   Easier to deploy           C...
Vendor Driven Portability     Interest in using NetflixOSS for Enterprise Private Clouds                                  ...
AWS 2009 vs. ??? Eucalyptus 3.3
Netflix Cloud PrizeBoosting the @NetflixOSS Ecosystem
In 2012 Netflix Engineering won this..
We’d like to give out prizes too            But what for?     Contributions to NetflixOSS!     Shared under Apache license...
How long do you have?   Entries open March 13th Entries close September 15th         Six months…
Who can win?   Almost anyone, anywhere…Except current or former Netflix or         AWS employees
Who decides who wins?   Nominating Committee      Panel of Judges
Judges         Aino Corry                                                                        Martin FowlerProgram Chai...
What are Judges Looking For?   Eligible, Apache 2.0 licensed                              Original and useful contribution...
What do you win?One winner in each of the 10 categories  Ticket and expenses to attend AWS      Re:Invent 2013 in Las Vega...
How do you enter?    Get a (free) github accountFork github.com/netflix/cloud-prize    Send us your email address   Descri...
Award                                           Apache         Registration                                               ...
Functionality and scale now, portability coming   Moving from parts to a platform in 2013       Netflix is fostering an ec...
TakeawayNetflix is making it easy for everyone to adopt Cloud Native patterns.     Open Source is not just the default, it...
Netflix and Open Source
Netflix and Open Source
Upcoming SlideShare
Loading in...5
×

Netflix and Open Source

20,222

Published on

Summary of cloud native architecture, NetflixOSS and the Cloud Prize

Published in: Technology
0 Comments
48 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
20,222
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
48
Embeds 0
No embeds

No notes for slide
  • When Netflix first moved to cloud it was bleeding edge innovation, we figured stuff out and made stuff up from first principles. Over the last two years more large companies have moved to cloud, and the principles, practices and patterns have become better understood and adopted. At this point there is intense interest in how Netflix runs in the cloud, and several forward looking organizations adopting our architectures and starting to use some of the code we have shared. Over the coming years, we want to make it easier for people to share the patterns we use.
  • The railroad made it possible for California to be developed quickly, by creating an easy to follow path we can create a much bigger ecosystem around the Netflix platform
  • Netflix and Open Source

    1. 1. Netflix and Open Source March 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
    2. 2. Cloud NativeNetflixOSS – Cloud Native On-RampNetflix Open Source Cloud Prize
    3. 3. Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
    4. 4. How Netflix Streaming WorksConsumerElectronics User Data Web Site orAWS Cloud Discovery API Services PersonalizationCDN EdgeLocations DRM Customer Device Streaming API (PC, PS3, TV…) QoS Logging CDN Management and Steering OpenConnect CDN Boxes Content Encoding
    5. 5. Content Delivery ServiceOpen Source Hardware Design + FreeBSD, bird, nginx
    6. 6. November 2012 Traffic
    7. 7. Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics)Each icon isthree to a fewhundredinstancesacross three CassandraAWS zones memcached Web service Start Here S3 bucketThree Personalization movie groupchoosers (for US, Canada and Latam)
    8. 8. Cloud Native Architecture Clients Things Autoscaled Micro JVM JVM JVM Services Autoscaled Micro JVM JVM Memcached ServicesDistributed Quorum Cassandra Cassandra Cassandra NoSQL Datastores Zone A Zone B Zone C
    9. 9. Non-Native Cloud ArchitectureAgile Mobile iOS/Android Mammals Cloudy App Servers BufferDatacenter MySQL Legacy AppsDinosaurs
    10. 10. New Anti-Fragile Patterns Micro-services Chaos enginesHighly available systems composed from ephemeral components
    11. 11. Stateless Micro-Service ArchitectureLinux Base AMI (CentOS or Ubuntu) Optional Apache frontend, Java (JDK 6 or 7)memcached,non-java apps AppDynamics Monitoring appagent monitoring Tomcat Log rotation Application war file, base Healthcheck, status to S3 GC and thread servlet, platform, client servlets, JMX interface,AppDynamics dump logging interface jars, Astyanax Servo autoscalemachineagent Epic/Atlas
    12. 12. Cassandra Instance ArchitectureLinux Base AMI (CentOS or Ubuntu) Tomcat andPriam on JDK Java (JDK 7)Healthcheck, Status AppDynamics appagent monitoring Cassandra Server MonitoringAppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk GC and thread holding Commit log and SSTablesmachineagent dump logging Epic/Atlas
    13. 13. Configuration State Management Datacenter CMDB’s woeful Cloud native is the solution Dependably complete
    14. 14. Edda – Configuration History http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html Eureka Services metadata AWS AppDynamicsInstances, Request flowASGs, etc. Edda Monkeys
    15. 15. Edda Query ExamplesFind any instances that have ever had a specific public IP address$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0" ["i-0123456789","i-012345678a","i-012345678b”]Show the most recent change to a security group$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504@@ -1,33 +1,33 @@ {… "ipRanges" : [ "10.10.1.1/32", "10.10.1.2/32",+ "10.10.1.3/32",- "10.10.1.4/32"… }
    16. 16. Cloud NativeMaster copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
    17. 17. Scalability Demands
    18. 18. Asgardhttp://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
    19. 19. Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40sScaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB) Min. 1st Qu. Median Mean 3rd Qu. Max. 41.0 104.2 149.0 171.8 215.8 562.0
    20. 20. Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s hAutoscale Up Autoscale Down
    21. 21. Leveraging Public Scale 1,000 Instances 100,000 Instances GreyPublic Private AreaStartups Netflix Google
    22. 22. How big is Public? AWS Maximum Possible Instance Count 3.7 Million Growth >10x in Three Years, >2x Per AnnumAWS upper bound estimate based on the number of public IP Addresses Every provisioned instance gets a public IP by default
    23. 23. Availability Is it running yet?How many places is it running in? How far apart are those places?
    24. 24. Antifragile API PatternsFunctional Reactive with Circuit Breakers and Bulkheads
    25. 25. Outages• Running very fast with scissors – Mostly self inflicted – bugs, mistakes – Some caused by AWS bugs and mistakes• Next step is multi-region – Investigating and building in stages during 2013 – Could have prevented some of our 2012 outages
    26. 26. Managing Multi-Region Availability AWS DynECT Route53 UltraDNS DNS Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone CCassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas What we need is a portable way to manage multiple DNS providers….
    27. 27. Denominator Software Defined DNS for Java Edda, Multi- Use Cases Region Failover Common Model DenominatorDNS Vendor Plug-in AWS Route53 DynECT UltraDNS Etc…API Models (varied IAM Key Auth User/pwd User/pwdand mostly broken) REST REST SOAP Currently being built by Adrian Cole (the jClouds guy, he works for Netflix now…)
    28. 28. A Cloud Native Open Source Platform
    29. 29. Inspiration
    30. 30. Three Questions Why is Netflix doing this?How does it all fit together? What is coming next?
    31. 31. Beware of Geeks Bearing Gifts: Strategies for an Increasingly Open Economy Simon Wardley - Researcher at the Leading Edge Forum
    32. 32. How did Netflix get ahead?Netflix Business + Developer Org Traditional IT Operations• Doing it right now • Taking their time• SaaS Applications • Pilot private cloud projects• PaaS for agility • Beta quality installations• Public IaaS for AWS features • Small scale• Big data in the cloud • Integrating several vendors• Integrating many APIs • Paying big $ for software• FOSS from github • Paying big $ for consulting• Renting hardware for 1hr • Buying hardware for 3yrs• Coding in Java/Groovy/Scala • Hacking at scripts
    33. 33. Netflix Platform Evolution 2009-2010 2011-2012 2013-2014Bleeding Edge Common Shared Innovation Pattern Pattern Netflix ended up several years ahead of the industry, but it’s not a sustainable position
    34. 34. Making it easy to followExploring the wild west each time vs. laying down a shared route
    35. 35. Establish our Hire, Retain and solutions as Best Engage TopPractices / Standards Engineers Goals Build up Netflix Benefit from a Technology Brand shared ecosystem
    36. 36. How does it all fit together?
    37. 37. NetflixOSS Continuous Build and Deployment Github Maven AWS NetflixOSS Central Base AMI Source Cloudbees Dynaslave Jenkins AWS AWS Build Aminator Baked AMIs Slaves Bakery Odin Asgard AWS Orchestration (+ Frigga) Account API Console
    38. 38. NetflixOSS Services ScopeAWS AccountAsgard ConsoleArchaius Config Multiple AWS Regions Service Cross region Priam C* Eureka Registry Explorers Dashboards Exhibitor ZK 3 AWS Zones Application Priam Evcache Atlas Edda History Clusters Cassandra Memcached Monitoring Autoscale Groups Persistent Storage Ephemeral Storage Instances Simian ArmyGenie Hadoop Services
    39. 39. NetflixOSS Instance Libraries • Baked AMI – Tomcat, Apache, your codeInitialization • Governator – Guice based dependency injection • Archaius – dynamic configuration properties client • Eureka - service registration client Service • Karyon - Base Server for inbound requests • RxJava – Reactive pattern • Hystrix/Turbine – dependencies and real-time status Requests • Ribbon - REST Client for outbound calls • Astyanax – Cassandra client and pattern libraryData Access • Evcache – Zone aware Memcached client • Curator – Zookeeper patterns • Denominator – DNS routing abstraction • Blitz4j – non-blocking logging Logging • Servo – metrics export for autoscaling • Atlas – high volume instrumentation
    40. 40. NetflixOSS Testing and Automation • CassJmeter – Load testing for Cassandra Test Tools • Circus Monkey – Test account reservation rebalancing • Janitor Monkey – Cleans up unused resources • Efficiency MonkeyMaintenance • Doctor Monkey • Howler Monkey – Complains about expiring certs • Chaos Monkey – Kills Instances • Chaos Gorilla – Kills Availability ZonesAvailability • Chaos Kong – Kills Regions • Latency Monkey – Latency and error injection • Security Monkey Security • Conformity Monkey
    41. 41. Example Application – RSS Reader
    42. 42. What’s Coming Next? Better portability Higher availability MoreFeatures Easier to deploy Contributions from end users Contributions from vendors More Use Cases
    43. 43. Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Release 3.3 in 2Q13 Some vendor interestSome vendor interest Many missing featuresNeeds AWS compatible Autoscaler Bait and switch AWS API strategy
    44. 44. AWS 2009 vs. ??? Eucalyptus 3.3
    45. 45. Netflix Cloud PrizeBoosting the @NetflixOSS Ecosystem
    46. 46. In 2012 Netflix Engineering won this..
    47. 47. We’d like to give out prizes too But what for? Contributions to NetflixOSS! Shared under Apache license Located on github
    48. 48. How long do you have? Entries open March 13th Entries close September 15th Six months…
    49. 49. Who can win? Almost anyone, anywhere…Except current or former Netflix or AWS employees
    50. 50. Who decides who wins? Nominating Committee Panel of Judges
    51. 51. Judges Aino Corry Martin FowlerProgram Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks Strategist Werner Vogels Yury Izrailevsky CTO Amazon Joe Weinman VP Cloud Netflix SVP Telx, Author “Cloudonomics”
    52. 52. What are Judges Looking For? Eligible, Apache 2.0 licensed Original and useful contribution to NetflixOSS Code that successfully builds and passes a test suite A large number of watchers, stars and forks on github NetflixOSS project pull requests Good code quality and structure Documentation on how to build and run itEvidence that code is in use by other projects, or is running in production
    53. 53. What do you win?One winner in each of the 10 categories Ticket and expenses to attend AWS Re:Invent 2013 in Las Vegas A Trophy
    54. 54. How do you enter? Get a (free) github accountFork github.com/netflix/cloud-prize Send us your email address Describe and build your entry Twitter #cloudprize
    55. 55. Award Apache Registration Close Entries AWS CeremonyGithub Opens Today Github Licensed Github September 15 Dinner Contributions Re:Invent November Judges Winners $10K cash $5K AWS Netflix Nominations Categories Ten Prize Engineering Categories AWSTrophy Re:Invent Conforms to Working Community Tickets Entrants Rules Code Traction
    56. 56. Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering an ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
    57. 57. TakeawayNetflix is making it easy for everyone to adopt Cloud Native patterns. Open Source is not just the default, it’s a strategic weapon. http://netflix.github.com http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft @adrianco #netflixcloud @NetflixOSS

    ×