How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Invent 2013


Published on

You're on the verge of a new startup and you need to build a world-class, high-scale web application on AWS so it can handle millions of users. How do you build it quickly without having to reinvent and re-implement the best-practices of large successful Internet companies? NetflixOSS is your answer. In this session, we’ll cover how an emerging startup can leverage the different open source tools that Netflix has developed and uses every day in production, ranging from baking and deploying applications (Asgard, Aminator), to hardening resiliency to failures (Hystrix, Simian Army, Zuul), making them highly distributed and load balanced (Eureka, Ribbon, Archaius) and managing your AWS resources efficiently and effectively (Edda, Ice). You’ll learn how to get started using these tools, learn best practices from engineers who actually created them, so, like Netflix, you can too unleash the power of AWS and scale your application processes as you grow.

Published in: Technology, Self Improvement
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Invent 2013

  1. 1. Asgard, Aminator, Simian Army and More: How Netflix’s proven Tools Can Help Accelerate Your Startup Adrian Cockcroft, Ruslan Meshenberg, Netflix November 2013 © 2013, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of, Inc.
  2. 2. Congratulations, Your Startup Got Funding! • • • • • More developers More customers Higher availability Global distribution No time…. Growth
  3. 3. Your Architecture Looks like This: Web UI / Front End API Middle Tier RDS/MySQL AWS Zone A
  4. 4. And It Needs to Look More like This… Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas
  5. 5. Inside Each AWS Zone: Microservices and Denormalized Data Stores Memcached Cassandra API or Web Calls Web Service S3 Bucket
  6. 6. We’re here to help you get to global scale… Apache Licensed Cloud Native OSS Platform
  7. 7. Technical Indigestion – What Do All These Do?
  8. 8. Updated Site – Make It Easier to Find What You Need
  9. 9. Getting Started with NetflixOSS Step by Step 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Set up AWS accounts to get the foundation in place Security and access management setup Account management tools: Asgard for deploy & Ice for cost monitoring Build tools: Aminator to automate baking AMIs Service registry and searchable account history: Eureka & Edda Configuration management: Archaius dynamic property system Data storage: Cassandra, Astyanax, Priam, EVCache Dynamic traffic routing: Denominator, Zuul, Ribbon, Karyon Availability: Simian Army (Chaos Monkey), Hystrix, Turbine Developer productivity: Blitz4J, GCViz, Pytheas, RxJava Big data: Genie for Hadoop PaaS, Lipstick visualizer for Pig Sample apps to get started: RSS Reader, ACME Air, FluxCapacitor
  10. 10. AWS Account Setup
  11. 11. Flow of Code and Data between AWS Accounts Production Account AMI New Code Dev Test Build Account Weekend S3 restore AMI Auditable Account Backup Data to S3 Archive Account Backup Data to S3
  12. 12. Account Security • Protect accounts – Two-factor authentication for primary login • Delegated minimum privilege – Create IAM roles for everything • Security groups – Control who can call your services
  13. 13. Cloud Access Control Developers Cloud access audit log SSH/sudo bastion www- • Userid wwwprod prod Security groups don’t allow SSH between instances Dalprod • Userid dalprod Cass- • Userid cassprod prod
  14. 14. Tooling and Infrastructure
  15. 15. Fast Start Amazon Machine Images • Pre-built AMIs for – – – – – Asgard – developer self-service deployment console Aminator – build system to bake code onto AMIs Edda – historical configuration database Eureka – service registry Simian Army – Janitor Monkey, Chaos Monkey, Conformity Monkey • NetflixOSS Cloud Prize Contribution – Produced by Answers4aws – Peter Sankauskas
  16. 16. Fast Setup CloudFormation Templates • CloudFormation templates for – – – – – Asgard – developer self service deployment console Aminator – build system to bake code onto AMIs Edda – historical configuration database Eureka – service registry Simian Army – Janitor Monkey for cleanup
  17. 17. CloudFormation Walkthrough for Asgard (Repeat for Prod, Test, and Audit Accounts)
  18. 18. Setting Up Asgard – Step 1 Create New Stack
  19. 19. Setting Up Asgard – Step 2 Select Template
  20. 20. Setting Up Asgard – Step 3 Enter IP and Keys
  21. 21. Setting Up Asgard – Step 4 Skip Tags
  22. 22. Setting Up Asgard – Step 5 Confirm
  23. 23. Setting Up Asgard – Step 6 Watch CloudFormation
  24. 24. Setting Up Asgard – Step 7 Find PublicDNS Name
  25. 25. Open Asgard – Step 8 Enter Credentials
  26. 26. Use Asgard – AWS Self-service Portal
  27. 27. Use Asgard – Manage Red/Black Deployments
  28. 28. Track AWS Spend in Detail with ICE
  29. 29. Ice – Slice and Dice Detailed Costs and Usage
  30. 30. Setting Up ICE • Visit github site for instructions • Currently depends on HiCharts – – – – Non–open source package license Free for noncommercial use Download and license your own copy We can’t provide a prebuilt AMI – sorry! • Long-term plan to make ICE fully OSS – Anyone want to help?
  31. 31. Build Pipeline Automation Jenkins in the Cloud Autobuilds NetflixOSS Pull Requests
  32. 32. Automatically Baking AMIs with Aminator • • • • • Autoscale group instances should be identical Base plus code/config Immutable instances Works for 1 or 1000… Aminator launch – Use Asgard to start AMI or – AWS CloudFormation recipe
  33. 33. Discovering your Services – Eureka • Map applications by name to – AMI, instances, zones – IP addresses, URLs, ports – Keep track of healthy, unhealthy and initializing instances • Eureka Launch – Use Asgard to launch AMI or use AWS CloudFormation template
  34. 34. Deploying Eureka Service – 1 per Zone
  35. 35. Searchable State History for a Region / Account Eureka Services Metadata AWS Instances, ASGs, etc. Timestamped delta cache of JSON describe call results for anything of interest… Edda Launch Use Asgard to launch AMI or use CloudFormation template Your Own Custom State Edda Monkeys
  36. 36. Edda Query Examples Find any instances that have ever had a specific public IP address $ curl "http://edda/api/v2/view/instances;publicIpAddress=;_since=0" ["i-0123456789","i-012345678a","i-012345678b”] Show the most recent change to a security group $ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2" --- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810 +++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504 @@ -1,33 +1,33 @@ { … "ipRanges" : [ "", "", + "", "" … }
  37. 37. Archaius – Property Console
  38. 38. Archaius Library – Configuration Management Based on Pytheas. Not open sourced yet SimpleDB or DynamoDB for NetflixOSS. Netflix uses Cassandra for multi-region…
  39. 39. Data Storage and Access
  40. 40. Data Storage Options • RDS for MySQL – Deploy using Asgard • DynamoDB – Fast, easy to setup and scales up from a very low cost base • Cassandra – Provides portability, multiregion support, very large scale – Storage model supports incremental/immutable backups – Priam: easy deployment automation for Cassandra on AWS
  41. 41. Priam – Cassandra co-process • • • • • • • Runs alongside Cassandra on each instance Fully distributed, no central master coordination Amazon S3-based backup and recovery automation Bootstrapping and automated token assignment Centralized configuration management RESTful monitoring and metrics Underlying config in SimpleDB (Cass_turtle for MR)
  42. 42. Astyanax Cassandra Client for Java • Features – – – – – – – Abstraction of connection pool from RPC protocol Fluent style API Operation retry with backoff Token aware Batch manager Many useful recipes Entity mapper based on JPA annotations
  43. 43. Cassandra Astyanax Recipes • • • • • • • • • Distributed row lock (without needing zookeeper) Multiregion row lock Uniqueness constraint Multirow uniqueness constraint Chunked and multithreaded large file storage Reverse index search All rows query Durable message queue Contributed: High cardinality reverse index
  44. 44. EVCache - Low latency data access • Multi-AZ and multiregion replication • Ephemeral data, session state (sort of) • Client code • Memcached
  45. 45. Routing Customers to Code
  46. 46. Denominator: DNS for Multiregion Availability DynECT DNS UltraDNS Denominator AWS Route53 Regional Load Balancers Regional Load Balancers Zuul API Router Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Denominator – manage traffic via multiple DNS providers with Java code
  47. 47. Zuul – Smart and Scalable Routing Layer
  48. 48. Ribbon Library for Internal Request Routing
  49. 49. Ribbon – Zone Aware LB
  50. 50. Karyon – Common Server Container • Bootstrapping o Dependency & lifecycle management via Governator. o Service registry via Eureka. o Property management via Archaius o Hooks for Latency Monkey testing o Preconfigured status page and heathcheck servlets
  51. 51. Karyon • Embedded status page console o o o Environment Eureka JMX
  52. 52. Karyon • • Sample service using Karyon available as "Hello-netflix-oss" on github Recipes ...
  53. 53. Availability
  54. 54. Either You Break It, or Users Will
  55. 55. Add Some Chaos to Your System
  56. 56. Clean Up Your Room! – Janitor Monkey Works with Edda history to clean up after Asgard
  57. 57. Conformity Monkey Track and alert for old code versions and known issues Walks Karyon status pages found via Edda
  58. 58. Hystrix Circuit Breaker: Fail Fast -> Recover Fast
  59. 59. Hystrix Circuit Breaker State Flow
  60. 60. Turbine Dashboard Per second update circuit breakers in a web browser
  61. 61. Developer Productivity
  62. 62. Blitz4J – Nonblocking Logging • • • • Better handling of log messages during storms Replace sync with concurrent data structures. Extreme configurability Isolation of app threads from logging threads
  63. 63. JVM Garbage Collection issues? GCViz! • • • • • Convenient Visual Causation Clarity Iterative
  64. 64. Pytheas – OSS-based Tooling Framework • • • • • • • • Guice Jersey FreeMarker JQuery DataTables D3 JQuery-UI Bootstrap
  65. 65. RxJava - Functional Reactive Programming • A simpler approach to concurrency – Use Observable as a simple stable composable abstraction • Observable service layer enables any of these – – – – – Conditionally return immediately from a cache Block instead of using threads if resources are constrained Use multiple threads Use nonblocking I/O Migrate an underlying implementation from network based to inmemory cache
  66. 66. Big Data and Analytics
  67. 67. Hadoop Jobs – Genie
  68. 68. Lipstick – Visualization for Pig Queries
  69. 69. Putting It All Together…
  70. 70. Sample Application – RSS Reader
  71. 71. Third-party Sample App by Chris Fregly Flux Capacitor is a Java-based reference application demonstrating the following: archaius (zookeeper-based dynamic configuration) astyanax (cassandra client) blitz4j (asynchronous logging) curator (zookeeper client) eureka (discovery service) exhibitor (zookeeper administration) governator (guice-based DI extensions) hystrix (circuit breaker) karyon (common base web service) ribbon (eureka-based REST client) servo (metrics client) turbine (metrics aggregation) Flux uses many popular open source tools such as Graphite, Jersey, Jetty, Netty, and Tomcat.
  72. 72. Third-party Sample App by IBM
  73. 73. Some of the Companies Using NetflixOSS (There are many more, please send us your logo!)
  74. 74. Takeaway
  75. 75. Use NetflixOSS to scale your startup or re:Invent your Enterprise Contribute to existing github projects and add your own Talk to us about @NetflixOSS at the Netflix booth in the Expo
  76. 76. Topic Session # When How Netflix’s Proven Tools Can Help Accelerate Your Start-up SVC202 Wednesday, Nov 13, 1:30 PM - 2:30 PM What Enterprises Can Learn from “All-in” Cloud Users Wednesday, Nov 13, 2:30 PM - 3:00 PM Accelerating Netflix Product Development Using AWS DMG206 Wednesday, Nov 13, 3:00 PM - 4:00 PM How Netflix Leverages Multiple Regions to Increase Availability: An Isthmus and Active-Active Case Study ARC305 Wednesday, Nov 13, 4:15 PM - 5:15 PM Data Science at Netflix with Amazon EMR BDT306 Wednesday, Nov 13, 4:15 PM - 5:15 PM What an Enterprise Can Learn from Netflix, a Cloud-native Company ENT203 Thursday, Nov 14, 4:15 PM - 5:15 PM Maximizing Audience Engagement in Media Delivery MED303 Thursday, Nov 14, 4:15 PM - 5:15 PM Scaling your Analytics with Amazon Elastic MapReduce BDT301 Thursday, Nov 14, 4:15 PM - 5:15 PM Automated Media Workflows in the Cloud MED304 Thursday, Nov 14, 5:30 PM - 6:30 PM Deft Data at Netflix: Using Amazon S3 and Amazon Elastic MapReduce for Monitoring at Gigascale BDT302 Thursday, Nov 14, 5:30 PM - 6:30 PM Encryption and Key Management in AWS SEC304 Friday, Nov 15, 9:00 AM - 10:00 AM Your Linux AMI: Optimization and Performance CPN302 Friday, Nov 15, 11:30 AM - 12:30 PM
  77. 77. Thank You!
  78. 78. We are sincerely eager to hear your feedback on this presentation and on re:Invent. SVC202 Please fill out an evaluation form when you have a chance.