• Save
Netflix and Open Source
Upcoming SlideShare
Loading in...5
×
 

Netflix and Open Source

on

  • 19,589 views

Summary of cloud native architecture, NetflixOSS and the Cloud Prize

Summary of cloud native architecture, NetflixOSS and the Cloud Prize

Statistics

Views

Total Views
19,589
Views on SlideShare
14,499
Embed Views
5,090

Actions

Likes
39
Downloads
0
Comments
0

16 Embeds 5,090

https://twitter.com 3783
http://www.scoop.it 692
http://www.redditmedia.com 463
http://inspiredbytechnology.wordpress.com 42
http://eventifier.co 27
http://tweetedtimes.com 18
https://confluence.skunk-works.no 18
https://stratistore-public.sharepoint.com 18
http://stratistore-public.sharepoint.com 14
http://moderation.local 6
https://abs.twimg.com 3
http://pmomale-ld1 2
http://www.kred.com 1
http://kred.com 1
http://webcache.googleusercontent.com 1
https://www.facebook.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • When Netflix first moved to cloud it was bleeding edge innovation, we figured stuff out and made stuff up from first principles. Over the last two years more large companies have moved to cloud, and the principles, practices and patterns have become better understood and adopted. At this point there is intense interest in how Netflix runs in the cloud, and several forward looking organizations adopting our architectures and starting to use some of the code we have shared. Over the coming years, we want to make it easier for people to share the patterns we use.
  • The railroad made it possible for California to be developed quickly, by creating an easy to follow path we can create a much bigger ecosystem around the Netflix platform

Netflix and Open Source Netflix and Open Source Presentation Transcript

  • Netflix and Open Source March 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
  • Cloud NativeNetflixOSS – Cloud Native On-RampNetflix Open Source Cloud Prize
  • Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
  • How Netflix Streaming WorksConsumerElectronics User Data Web Site orAWS Cloud Discovery API Services PersonalizationCDN EdgeLocations DRM Customer Device Streaming API (PC, PS3, TV…) QoS Logging CDN Management and Steering OpenConnect CDN Boxes Content Encoding
  • Content Delivery ServiceOpen Source Hardware Design + FreeBSD, bird, nginx
  • November 2012 Traffic
  • Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics)Each icon isthree to a fewhundredinstancesacross three CassandraAWS zones memcached Web service Start Here S3 bucketThree Personalization movie groupchoosers (for US, Canada and Latam)
  • Cloud Native Architecture Clients Things Autoscaled Micro JVM JVM JVM Services Autoscaled Micro JVM JVM Memcached ServicesDistributed Quorum Cassandra Cassandra Cassandra NoSQL Datastores Zone A Zone B Zone C
  • Non-Native Cloud ArchitectureAgile Mobile iOS/Android Mammals Cloudy App Servers BufferDatacenter MySQL Legacy AppsDinosaurs
  • New Anti-Fragile Patterns Micro-services Chaos enginesHighly available systems composed from ephemeral components
  • Stateless Micro-Service ArchitectureLinux Base AMI (CentOS or Ubuntu) Optional Apache frontend, Java (JDK 6 or 7)memcached,non-java apps AppDynamics Monitoring appagent monitoring Tomcat Log rotation Application war file, base Healthcheck, status to S3 GC and thread servlet, platform, client servlets, JMX interface,AppDynamics dump logging interface jars, Astyanax Servo autoscalemachineagent Epic/Atlas
  • Cassandra Instance ArchitectureLinux Base AMI (CentOS or Ubuntu) Tomcat andPriam on JDK Java (JDK 7)Healthcheck, Status AppDynamics appagent monitoring Cassandra Server MonitoringAppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk GC and thread holding Commit log and SSTablesmachineagent dump logging Epic/Atlas
  • Configuration State Management Datacenter CMDB’s woeful Cloud native is the solution Dependably complete
  • Edda – Configuration History http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html Eureka Services metadata AWS AppDynamicsInstances, Request flowASGs, etc. Edda Monkeys
  • Edda Query ExamplesFind any instances that have ever had a specific public IP address$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0" ["i-0123456789","i-012345678a","i-012345678b”]Show the most recent change to a security group$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504@@ -1,33 +1,33 @@ {… "ipRanges" : [ "10.10.1.1/32", "10.10.1.2/32",+ "10.10.1.3/32",- "10.10.1.4/32"… }
  • Cloud NativeMaster copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
  • Scalability Demands
  • Asgardhttp://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
  • Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40sScaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB) Min. 1st Qu. Median Mean 3rd Qu. Max. 41.0 104.2 149.0 171.8 215.8 562.0
  • Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s hAutoscale Up Autoscale Down
  • Leveraging Public Scale 1,000 Instances 100,000 Instances GreyPublic Private AreaStartups Netflix Google
  • How big is Public? AWS Maximum Possible Instance Count 3.7 Million Growth >10x in Three Years, >2x Per AnnumAWS upper bound estimate based on the number of public IP Addresses Every provisioned instance gets a public IP by default
  • Availability Is it running yet?How many places is it running in? How far apart are those places?
  • Antifragile API PatternsFunctional Reactive with Circuit Breakers and Bulkheads
  • Outages• Running very fast with scissors – Mostly self inflicted – bugs, mistakes – Some caused by AWS bugs and mistakes• Next step is multi-region – Investigating and building in stages during 2013 – Could have prevented some of our 2012 outages
  • Managing Multi-Region Availability AWS DynECT Route53 UltraDNS DNS Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone CCassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas What we need is a portable way to manage multiple DNS providers….
  • Denominator Software Defined DNS for Java Edda, Multi- Use Cases Region Failover Common Model DenominatorDNS Vendor Plug-in AWS Route53 DynECT UltraDNS Etc…API Models (varied IAM Key Auth User/pwd User/pwdand mostly broken) REST REST SOAP Currently being built by Adrian Cole (the jClouds guy, he works for Netflix now…)
  • A Cloud Native Open Source Platform
  • Inspiration
  • Three Questions Why is Netflix doing this?How does it all fit together? What is coming next?
  • Beware of Geeks Bearing Gifts: Strategies for an Increasingly Open Economy Simon Wardley - Researcher at the Leading Edge Forum
  • How did Netflix get ahead?Netflix Business + Developer Org Traditional IT Operations• Doing it right now • Taking their time• SaaS Applications • Pilot private cloud projects• PaaS for agility • Beta quality installations• Public IaaS for AWS features • Small scale• Big data in the cloud • Integrating several vendors• Integrating many APIs • Paying big $ for software• FOSS from github • Paying big $ for consulting• Renting hardware for 1hr • Buying hardware for 3yrs• Coding in Java/Groovy/Scala • Hacking at scripts
  • Netflix Platform Evolution 2009-2010 2011-2012 2013-2014Bleeding Edge Common Shared Innovation Pattern Pattern Netflix ended up several years ahead of the industry, but it’s not a sustainable position
  • Making it easy to followExploring the wild west each time vs. laying down a shared route
  • Establish our Hire, Retain and solutions as Best Engage TopPractices / Standards Engineers Goals Build up Netflix Benefit from a Technology Brand shared ecosystem
  • How does it all fit together?
  • NetflixOSS Continuous Build and Deployment Github Maven AWS NetflixOSS Central Base AMI Source Cloudbees Dynaslave Jenkins AWS AWS Build Aminator Baked AMIs Slaves Bakery Odin Asgard AWS Orchestration (+ Frigga) Account API Console
  • NetflixOSS Services ScopeAWS AccountAsgard ConsoleArchaius Config Multiple AWS Regions Service Cross region Priam C* Eureka Registry Explorers Dashboards Exhibitor ZK 3 AWS Zones Application Priam Evcache Atlas Edda History Clusters Cassandra Memcached Monitoring Autoscale Groups Persistent Storage Ephemeral Storage Instances Simian ArmyGenie Hadoop Services
  • NetflixOSS Instance Libraries • Baked AMI – Tomcat, Apache, your codeInitialization • Governator – Guice based dependency injection • Archaius – dynamic configuration properties client • Eureka - service registration client Service • Karyon - Base Server for inbound requests • RxJava – Reactive pattern • Hystrix/Turbine – dependencies and real-time status Requests • Ribbon - REST Client for outbound calls • Astyanax – Cassandra client and pattern libraryData Access • Evcache – Zone aware Memcached client • Curator – Zookeeper patterns • Denominator – DNS routing abstraction • Blitz4j – non-blocking logging Logging • Servo – metrics export for autoscaling • Atlas – high volume instrumentation
  • NetflixOSS Testing and Automation • CassJmeter – Load testing for Cassandra Test Tools • Circus Monkey – Test account reservation rebalancing • Janitor Monkey – Cleans up unused resources • Efficiency MonkeyMaintenance • Doctor Monkey • Howler Monkey – Complains about expiring certs • Chaos Monkey – Kills Instances • Chaos Gorilla – Kills Availability ZonesAvailability • Chaos Kong – Kills Regions • Latency Monkey – Latency and error injection • Security Monkey Security • Conformity Monkey
  • Example Application – RSS Reader
  • What’s Coming Next? Better portability Higher availability MoreFeatures Easier to deploy Contributions from end users Contributions from vendors More Use Cases
  • Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Release 3.3 in 2Q13 Some vendor interestSome vendor interest Many missing featuresNeeds AWS compatible Autoscaler Bait and switch AWS API strategy
  • AWS 2009 vs. ??? Eucalyptus 3.3
  • Netflix Cloud PrizeBoosting the @NetflixOSS Ecosystem
  • In 2012 Netflix Engineering won this..
  • We’d like to give out prizes too But what for? Contributions to NetflixOSS! Shared under Apache license Located on github
  • How long do you have? Entries open March 13th Entries close September 15th Six months…
  • Who can win? Almost anyone, anywhere…Except current or former Netflix or AWS employees
  • Who decides who wins? Nominating Committee Panel of Judges
  • Judges Aino Corry Martin FowlerProgram Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks Strategist Werner Vogels Yury Izrailevsky CTO Amazon Joe Weinman VP Cloud Netflix SVP Telx, Author “Cloudonomics”
  • What are Judges Looking For? Eligible, Apache 2.0 licensed Original and useful contribution to NetflixOSS Code that successfully builds and passes a test suite A large number of watchers, stars and forks on github NetflixOSS project pull requests Good code quality and structure Documentation on how to build and run itEvidence that code is in use by other projects, or is running in production
  • What do you win?One winner in each of the 10 categories Ticket and expenses to attend AWS Re:Invent 2013 in Las Vegas A Trophy
  • How do you enter? Get a (free) github accountFork github.com/netflix/cloud-prize Send us your email address Describe and build your entry Twitter #cloudprize
  • Award Apache Registration Close Entries AWS CeremonyGithub Opens Today Github Licensed Github September 15 Dinner Contributions Re:Invent November Judges Winners $10K cash $5K AWS Netflix Nominations Categories Ten Prize Engineering Categories AWSTrophy Re:Invent Conforms to Working Community Tickets Entrants Rules Code Traction
  • Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering an ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
  • TakeawayNetflix is making it easy for everyone to adopt Cloud Native patterns. Open Source is not just the default, it’s a strategic weapon. http://netflix.github.com http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft @adrianco #netflixcloud @NetflixOSS