GOTO Chicago - Speed and scale, how to get there

2,170 views
2,004 views

Published on

To deliver software products at high velocity requires four things. First, a culture of innovation that can see and respond to opportunities. Second, the data and analytics to evaluate alternatives. Third, a culture that can make decisions and assign resources quickly. Fourth, agile development and self service deployment. A fine grain loosely coupled architecture scales as the team size grows, a freedom and responsibility culture provides autonomy for innovation and fast decision making, unstructured "Big Data" analytics gets answers quickly, cloud removes the latency of resource allocation, and DevOps removes the coordination latency that slows down deployment. Traditional enterprise architectures are based on monolithic applications and relational databases. Cloud native architectures are based on buiding single function REST-based microservices that support integration across denormalized NoSQL data stores and a wide range of web services. This talk will also discuss strategies, patterns and pathways to perform a gradual migration towards cloud native.

Published in: Technology, Business

GOTO Chicago - Speed and scale, how to get there

  1. 1. Speed and Scale: How to get there. Adrian Cockcroft @adrianco May 2014
  2. 2. ‹#› | Battery Ventures
  3. 3. ‹#› | Battery Ventures Typical reactions to my Netflix talks…
  4. 4. ‹#› | Battery Ventures Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009
  5. 5. ‹#› | Battery Ventures Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010
  6. 6. ‹#› | Battery Ventures Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010 It only works for ‘Unicorns’ like Netflix” – 2011
  7. 7. ‹#› | Battery Ventures Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010 It only works for ‘Unicorns’ like Netflix” – 2011 “We’d like to do 
 that but can’t” – 2012
  8. 8. ‹#› | Battery Ventures Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010 It only works for ‘Unicorns’ like Netflix” – 2011 “We’d like to do 
 that but can’t” – 2012 “We’re on our way using Netflix OSS code” – 2013
  9. 9. ‹#› | Battery Ventures What I learned from my time at Netflix
  10. 10. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace
  11. 11. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace ● Remove friction from product development
  12. 12. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace ● Remove friction from product development ● High trust, low process, no hand-offs between teams
  13. 13. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace ● Remove friction from product development ● High trust, low process, no hand-offs between teams ● Freedom and responsibility culture
  14. 14. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace ● Remove friction from product development ● High trust, low process, no hand-offs between teams ● Freedom and responsibility culture ● Don’t do your own undifferentiated heavy lifting
  15. 15. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace ● Remove friction from product development ● High trust, low process, no hand-offs between teams ● Freedom and responsibility culture ● Don’t do your own undifferentiated heavy lifting ● Use simple patterns automated by tooling
  16. 16. ‹#› | Battery Ventures What I learned from my time at Netflix ● Speed wins in the marketplace ● Remove friction from product development ● High trust, low process, no hand-offs between teams ● Freedom and responsibility culture ● Don’t do your own undifferentiated heavy lifting ● Use simple patterns automated by tooling ● Self service cloud makes impossible things instant
  17. 17. ‹#› | Battery Ventures Enterprise IT Adoption of Cloud By Simon Wardley http://enterpriseitadoption.com/ Now %*&!”
  18. 18. ‹#› | Battery Ventures Speed
  19. 19. ‹#› | Battery Ventures Innovation
  20. 20. ‹#› | Battery Ventures New ideas
  21. 21. ‹#› | Battery Ventures New products
  22. 22. ‹#› | Battery Ventures What separates incumbents from disruptors?
  23. 23. ‹#› | Battery Ventures Assumptions
  24. 24. ‹#› | Battery Ventures Optimizations
  25. 25. ‹#› | Battery Ventures “It isn't what we don't know that gives us trouble, it's what we know that ain't so.” ! Will Rogers http://www.brainyquote.com/quotes/quotes/w/willrogers385286.html
  26. 26. ‹#› | Battery Ventures Incumbents follow the $$$ Market size lags disruption because high price products are replaced by low priced products
  27. 27. ‹#› | Battery Ventures Disruptors find what used to be expensive
  28. 28. ‹#› | Battery Ventures Learn to waste them to save money elsewhere
  29. 29. ‹#› | Battery Ventures Examples
  30. 30. ‹#› | Battery Ventures Solid State Disk Example
  31. 31. ‹#› | Battery Ventures Storage systems assume random reads are expensive Decades of filesystems and storage array development based on spinning rust
  32. 32. ‹#› | Battery Ventures RR is free Immutable writes Log-merge SSD works best for random reads and sequential writes. Bad for updates.
  33. 33. ‹#› | Battery Ventures SSD packaging as disk, as PCI card now as memory DIMM Each generation reduces overhead and improves price/performance
  34. 34. ‹#› | Battery Ventures Disclosure: Diablo Technologies is a Battery Ventures Portfolio Company See www.battery.com for a list of portfolio investments
  35. 35. ‹#› | Battery Ventures Traditional vs. Cloud Native Storage Architectures Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays
  36. 36. ‹#› | Battery Ventures Traditional vs. Cloud Native Storage Architectures Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups
  37. 37. ‹#› | Battery Ventures Traditional vs. Cloud Native Storage Architectures Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups SSDs inside arrays disrupt incumbent suppliers
  38. 38. ‹#› | Battery Ventures Traditional vs. Cloud Native Storage Architectures Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups SSDs inside ephemeral instances disrupt an entire industry SSDs inside arrays disrupt incumbent suppliers
  39. 39. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  40. 40. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  41. 41. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  42. 42. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  43. 43. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today ● Thousands of nodes per cluster actively being tested and used http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  44. 44. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today ● Thousands of nodes per cluster actively being tested and used ● Cassandra scale using high end AWS storage instances http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  45. 45. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today ● Thousands of nodes per cluster actively being tested and used ● Cassandra scale using high end AWS storage instances ● EC2 i2.8xlarge - over 300,000 iops read or write, 6.4TB of SSD http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  46. 46. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today ● Thousands of nodes per cluster actively being tested and used ● Cassandra scale using high end AWS storage instances ● EC2 i2.8xlarge - over 300,000 iops read or write, 6.4TB of SSD ● 100 nodes = 30 million iops and 640 TB - Ludicrous http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  47. 47. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today ● Thousands of nodes per cluster actively being tested and used ● Cassandra scale using high end AWS storage instances ● EC2 i2.8xlarge - over 300,000 iops read or write, 6.4TB of SSD ● 100 nodes = 30 million iops and 640 TB - Ludicrous ● 1000 nodes = 300 million iops and 6.4 PB - Plaid! http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  48. 48. ‹#› | Battery Ventures How to Scale Storage Beyond Ludicrous ● Cassandra scalability ● Linear scale up benchmarked and seen in production ● Hundreds of nodes per cluster in common use today ● Thousands of nodes per cluster actively being tested and used ● Cassandra scale using high end AWS storage instances ● EC2 i2.8xlarge - over 300,000 iops read or write, 6.4TB of SSD ● 100 nodes = 30 million iops and 640 TB - Ludicrous ● 1000 nodes = 300 million iops and 6.4 PB - Plaid! http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  49. 49. ‹#› | Battery Ventures Disruptor Cassandra Perfect match for SSD, no write amplification, no updates, scales to plaid
  50. 50. ‹#› | Battery Ventures Product Development Another disruptive example
  51. 51. ‹#› | Battery Ventures Assumption: Process prevents problems Another disruptive example
  52. 52. ‹#› | Battery Ventures Non-Cloud Product Development Months before you find out whether the product meets the need Business Need • Documents • Weeks Approval Process • Meetings • Weeks Hardware Purchase • Negotiations • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks
  53. 53. ‹#› | Battery Ventures Non-Cloud Product Development Months before you find out whether the product meets the need Hardware provisioning is undifferentiated heavy lifting – replace it with IaaS Business Need • Documents • Weeks Approval Process • Meetings • Weeks Hardware Purchase • Negotiations • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks
  54. 54. ‹#› | Battery Ventures Non-Cloud Product Development Months before you find out whether the product meets the need Hardware provisioning is undifferentiated heavy lifting – replace it with IaaS Business Need • Documents • Weeks Approval Process • Meetings • Weeks Hardware Purchase • Negotiations • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks IaaS Cloud
  55. 55. ‹#› | Battery Ventures Non-Cloud Product Development Months before you find out whether the product meets the need Hardware provisioning is undifferentiated heavy lifting – replace it with IaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks
  56. 56. ‹#› | Battery Ventures Process Hand-Off Steps for Product Development on IaaS Product Manager Development Team QA Integration Team Operations Deploy Team BI Analytics Team
  57. 57. ‹#› | Battery Ventures IaaS Based Product Development Weeks before you find out whether the product meets the need Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days
  58. 58. ‹#› | Battery Ventures IaaS Based Product Development Weeks before you find out whether the product meets the need Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days etc…
  59. 59. ‹#› | Battery Ventures IaaS Based Product Development Weeks before you find out whether the product meets the need Software provisioning is undifferentiated heavy lifting – replace it with PaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days etc…
  60. 60. ‹#› | Battery Ventures IaaS Based Product Development Weeks before you find out whether the product meets the need Software provisioning is undifferentiated heavy lifting – replace it with PaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days PaaS Cloud etc…
  61. 61. ‹#› | Battery Ventures IaaS Based Product Development Weeks before you find out whether the product meets the need Software provisioning is undifferentiated heavy lifting – replace it with PaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Customer Feedback • It sucks! • Days etc…
  62. 62. ‹#› | Battery Ventures Process Hand-Off Steps for Feature Development on PaaS Product Manager Developer BI Analytics Team
  63. 63. ‹#› | Battery Ventures PaaS Based Product Feature Development Days before you find out whether the feature meets the need Business Need • Discussions • Days Software Development • Code • Days Customer Feedback • Fix this Bit! • Hours etc…
  64. 64. ‹#› | Battery Ventures PaaS Based Product Feature Development Days before you find out whether the feature meets the need Building your own business apps is undifferentiated heavy lifting – use SaaS Business Need • Discussions • Days Software Development • Code • Days Customer Feedback • Fix this Bit! • Hours etc…
  65. 65. ‹#› | Battery Ventures PaaS Based Product Feature Development Days before you find out whether the feature meets the need Building your own business apps is undifferentiated heavy lifting – use SaaS Business Need • Discussions • Days Software Development • Code • Days Customer Feedback • Fix this Bit! • Hours SaaS/ BPaaS Cloud etc…
  66. 66. ‹#› | Battery Ventures PaaS Based Product Feature Development Days before you find out whether the feature meets the need Building your own business apps is undifferentiated heavy lifting – use SaaS Business Need • Discussions • Days Customer Feedback • Fix this Bit! • Hours etc…
  67. 67. ‹#› | Battery Ventures SaaS Based Business App Development Hours before you find out whether the feature meets the need Business Need •GUI Builder •Hours Customer Feedback •Fix this bit! •Seconds
  68. 68. ‹#› | Battery Ventures SaaS Based Business App Development Hours before you find out whether the feature meets the need Business Need •GUI Builder •Hours Customer Feedback •Fix this bit! •Seconds and thousands more…
  69. 69. ‹#› | Battery Ventures What Happened? Rate of change increased Cost and size and risk of change reduced
  70. 70. ‹#› | Battery Ventures Continuous Delivery on Cloud
  71. 71. ‹#› | Battery Ventures Observe Orient Decide Act Continuous Delivery on Cloud
  72. 72. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Measure Customers Continuous Delivery on Cloud
  73. 73. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point INNOVATION Measure Customers Continuous Delivery on Cloud
  74. 74. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis Model Hypotheses INNOVATION Measure Customers Continuous Delivery on Cloud
  75. 75. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis Model Hypotheses BIG DATA INNOVATION Measure Customers Continuous Delivery on Cloud
  76. 76. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Model Hypotheses BIG DATA INNOVATION Measure Customers Continuous Delivery on Cloud
  77. 77. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Model Hypotheses BIG DATA INNOVATION CULTURE Measure Customers Continuous Delivery on Cloud
  78. 78. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Incremental Features Automatic Deploy Launch AB Test Model Hypotheses BIG DATA INNOVATION CULTURE Measure Customers Continuous Delivery on Cloud
  79. 79. ‹#› | Battery Ventures Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Incremental Features Automatic Deploy Launch AB Test Model Hypotheses BIG DATA INNOVATION CULTURE CLOUD Measure Customers Continuous Delivery on Cloud
  80. 80. ‹#› | Battery Ventures Note: Non-Destructive Production Updates ● “Immutable Code” Service Pattern ● Existing services are unchanged, old code remains in service ● New code deploys as a new service group ● No impact to production until traffic routing changes ● A|B Tests, Feature Flags and Version Routing control traffic ● First users in the test cell are the developer and test engineers ● A cohort of users is added looking for measurable improvement ● Finally make default for everyone, keeping old code for a while
  81. 81. ‹#› | Battery Ventures Disruptor Continuous Delivery Compute capacity is an ephemeral commodity, learn to waste it to save time and get agility
  82. 82. ‹#› | Battery Ventures Development and Operations Another disruptive example, if you assume they don’t mix…
  83. 83. ‹#› | Battery Ventures Developers make code
  84. 84. ‹#› | Battery Ventures Operations run code
  85. 85. ‹#› | Battery Ventures It can take weeks to get a VM after a developer files a ticket…
  86. 86. ‹#› | Battery Ventures But if operations is a self service API…
  87. 87. ‹#› | Battery Ventures Developers run their own code
  88. 88. ‹#› | Battery Ventures Developers are on call
  89. 89. ‹#› | Battery Ventures Developers have freedom
  90. 90. ‹#› | Battery Ventures Developers have incentives to be responsible Avoids the externalities of over-dependence on operations to fix everything
  91. 91. ‹#› | Battery Ventures Less down time With the right incentives and tooling developers write code that scales and doesn't break
  92. 92. ‹#› | Battery Ventures No meetings Developers end up spending more time developing than when they had to keep explaining their code to ops
  93. 93. ‹#› | Battery Ventures DevOps is a re-org, not a new team to hire For most companies, the cultural transformation needed to do DevOps is the blocker
  94. 94. ‹#› | Battery Ventures Disruptor High Trust Culture DevOps Give up central coordination and control, to get speed and align incentives
  95. 95. ‹#› | Battery Ventures It’s what you know that isn’t so…
  96. 96. ‹#› | Battery Ventures It’s what you know that isn’t so… ● Make your assumptions explicit
  97. 97. ‹#› | Battery Ventures It’s what you know that isn’t so… ● Make your assumptions explicit ● Extrapolate trends to the limit
  98. 98. ‹#› | Battery Ventures It’s what you know that isn’t so… ● Make your assumptions explicit ● Extrapolate trends to the limit ● Listen to non-customers
  99. 99. ‹#› | Battery Ventures It’s what you know that isn’t so… ● Make your assumptions explicit ● Extrapolate trends to the limit ● Listen to non-customers ● Follow developer adoption, not IT spend
  100. 100. ‹#› | Battery Ventures It’s what you know that isn’t so… ● Make your assumptions explicit ● Extrapolate trends to the limit ● Listen to non-customers ● Follow developer adoption, not IT spend ● Map evolution of products to services to utilities
  101. 101. ‹#› | Battery Ventures It’s what you know that isn’t so… ● Make your assumptions explicit ● Extrapolate trends to the limit ● Listen to non-customers ● Follow developer adoption, not IT spend ● Map evolution of products to services to utilities ● Re-organize your teams for speed of execution
  102. 102. ‹#› | Battery Ventures How do we get there?
  103. 103. ‹#› | Battery Ventures "This is the IT swamp draining manual for anyone who is neck deep in alligators.”
  104. 104. ‹#› | Battery Ventures Once you’re out of the swamp, read this…
  105. 105. ‹#› | Battery Ventures Open Source Ecosystems ● The most advanced, scalable and stable code you can get is OSS ● No procurement cycle, fix and extend it yourself ● Github is a developer’s online resume ● Github is also your company’s online resume! ● Extensible platforms create ecosystems ● Give up control to get ubiquity – Apache license ! Innovate, Leverage and Commoditize
  106. 106. ‹#› | Battery Ventures Cloud Native for High Availability ● Business logic isolation in stateless micro-services ● Immutable code with instant rollback ● Auto-scaled capacity and deployment updates ● Distributed across availability zones and regions ● De-normalized single function NoSQL data stores ● See over 40 NetflixOSS projects at netflix.github.com ● Get “Technical Indigestion” trying to keep up with techblog.netflix.com
  107. 107. ‹#› | Battery Ventures A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts See http://en.wikipedia.org/wiki/Domain-driven_design for discussion of bounded contexts
  108. 108. ‹#› | Battery Ventures Scaling Continuous Delivery Models ● Devs book a train ticket ● Everyone runs the monolith ● Queue for the next train ● Coordination chat session ● Need to learn deploy process ● Copy code to existing servers ● Few concurrent versions ● Tens of monolithic updates/day maximum ● Roll-forward only ● “Done” is released to prod ● Everyone has their own build ● Dev runs their own microservice ● No waiting, no meetings ● API call to update prod timeline ● Automated hands-off deploy ● Immutable code on new servers ● Unlimited concurrent versions ● 100s of independent updates ● Roll-back in seconds ● “Done” is retired from prod Monolithic Microservices
  109. 109. ‹#› | Battery Ventures Separate Concerns Using Micro-services ● Invert Conway’s Law – teams own service groups and backend stores ● One “verb” per single function micro-service, size doesn’t matter ● One developer independently produces a micro-service ● Each micro-service is it’s own build, avoids trunk conflicts ● Deploy in a container: Tomcat, AMI or Docker, whatever… ● Stateless business logic. Cattle, not pets. ● Stateful cached data access layer can use ephemeral instances http://en.wikipedia.org/wiki/Conway's_law
  110. 110. ‹#› | Battery Ventures Microservices Development Architecture ● Client libraries Even if you start with a raw protocol, a client side driver is the end-state Best strategy is to own your own client libraries from the start ● Multithreading and Non-blocking Calls Reactive model RxJava uses Observable to hide concurrency cleanly Netty can be used to get non-blocking I/O speedup over Tomcat container ● Circuit Breakers – See Fluxcapacitor.com for code NetflixOSS Hystrix, Turbine, Latency Monkey, Ribbon/Karyon Also look at Finagle/Zipkin from Twitter
  111. 111. ‹#› | Battery Ventures Microservice Datastores ● Book: Refactoring Databases SchemaSpy to examine schema structure Denormalization into one datasource per table or materialized view ● Polyglot Persistence Use a mixture of database technologies, behind REST data access layers See NetflixOSS Storage Tier as a Service HTTP (staash.com) for MySQL and C* ● CAP – Consistent or Available when Partitioned Look at Jepsen torture tests for common systems aphyr.com/tags/jepsen There is no such thing as a consistent distributed system, get over it…
  112. 112. ‹#› | Battery Ventures Strategies for impatient product managers ● Carrot “This new feature you want will be ready faster as a microservice” ● Stick “This new feature you want will only be implemented in the new microservice based system” ● Shiny Object “Why don’t you concentrate on some other part of the system while we get the transition done?”
  113. 113. ‹#› | Battery Ventures Monitoring and Microservices
  114. 114. ‹#› | Battery Ventures Issues with Continuous Delivery and Microservices ● High rate of change Code pushes can cause floods of new instances and metrics Short baseline for alert threshold analysis – everything looks unusual ● Ephemeral Configurations Short lifetimes make it hard to aggregate historical views Hand tweaked monitoring tools take too much work to keep running ● Microservices with complex calling patterns End-to-end request flow measurements are very important Request flow visualizations get overwhelmed
  115. 115. ‹#› | Battery Ventures Microservice Based Architectures See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture From a Gilt Groupe Presentation
  116. 116. ‹#› | Battery Ventures “Death Star” Architecture Diagrams As visualized by Appdynamics, Boundary.com and Twitter internal tools
  117. 117. ‹#› | Battery Ventures “Death Star” Architecture Diagrams As visualized by Appdynamics, Boundary.com and Twitter internal tools Netflix Gilt Groupe (12 of 450) Twitter
  118. 118. ‹#› | Battery Ventures Monitoring Micro-services ● Appdynamics Instrument the JVM to capture everything including traffic flows Insert tag for every http request with a header annotation guid Visualize the over-all flow or the business transaction flow ● Boundary.com and Lyatiss CloudWeaver Instrument the packet flows across the network Capture the zone and region config from cloud APIs and tags Correlate, aggregate and visualize the traffic flows ● Instrumented PaaS Communication Mechanisms CloudFoundry and Apcera route all traffic through NATS NetflixOSS ribbon client and karyon server http annotation guid In-band mechanisms can scale beyond capabilities of centralized tools Visualizing the request flow
  119. 119. ‹#› | Battery Ventures Continuous Delivery and DevOps Implications ● Changes are smaller but more frequent ● Individual changes are more likely to be broken ● Changes are normally deployed by developers ● Feature flags are used to enable new code ● Instant detection and rollback matters much more
  120. 120. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold
  121. 121. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20
  122. 122. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger
  123. 123. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance
  124. 124. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance 1st high metric arrives at monitoring system
  125. 125. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed (maybe)
  126. 126. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed (maybe) 1st high metric seen on graph
  127. 127. ‹#› | Battery Ventures What’s wrong with measuring in minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed (maybe) 1st high metric seen on graph Three datapoints on user graph so looks bad at 8m00.
  128. 128. ‹#› | Battery Ventures Whoops! I didn’t mean that! Reverting…
 
 Not cool if it takes 5 minutes to see it failed and 5 more to see a fix
 No-one notices if it only takes 5 seconds to detect and 5 to see a fix
  129. 129. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7
  130. 130. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20
  131. 131. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20 Measurable in 1s
  132. 132. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance
  133. 133. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance 1st high metric arrives at monitoring system
  134. 134. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed
  135. 135. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed 1st high metric seen on graph
  136. 136. ‹#› | Battery Ventures Try that again by the second More confidence more quickly Threshold 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Something broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed 1st high metric seen on graph Three datapoints on user graph so looks bad at 2m25.
  137. 137. ‹#› | Battery Ventures NetflixOSS Hystrix / Turbine Circuit Breaker Monitoring http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html Streaming metrics directly from services to a web browser each second
  138. 138. ‹#› | Battery Ventures NetflixOSS Hystrix / Turbine Circuit Breaker Monitoring http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html Streaming metrics directly from services to a web browser each second
  139. 139. ‹#› | Battery Ventures Latest SaaS Based Monitoring Products www.vividcortex.com and www.boundary.com Seeing Problems In Seconds
  140. 140. ‹#› | Battery Ventures Metric to display latency needs to be less than human attention span (~10s)
  141. 141. ‹#› | Battery Ventures Summary ● Speed wins in the marketplace ● Remove friction from product development ● High trust, low process ● Freedom and responsibility culture ● Don’t do your own undifferentiated heavy lifting ● Simple patterns automated by tooling ● Microservices for speed and availability
  142. 142. ‹#› | Battery Ventures Separation of Concerns
 
 Bounded Contexts
  143. 143. ‹#› | Battery Ventures Any Questions? ● Battery Ventures http://www.battery.com ● Adrian’s Blog http://perfcap.blogspot.com ● Slideshare http://slideshare.com/adriancockcroft ! ● Migrating to Microservices – Qcon London - March 6th, 2014 ● Monitorama Opening Keynote Portland OR - May 7th, 2014 ● GOTO Chicago Opening Keynote May 20th, 2014 ● DevOps Summit at Cloud Expo New York – June 10th, 2014 ● Qcon New York – June 11th, 2014 ● GOTO Copenhagen/Aarhus – Denmark – Oct 25th, 2014 Disclosure: some of the companies mentioned are Battery Ventures Portfolio Companies See www.battery.com for a list of portfolio investments

×