Advertisement

Microservices Workshop - Craft Conference

Technology Fellow at Battery Ventures
Apr. 28, 2016
Advertisement

More Related Content

Advertisement

More from Adrian Cockcroft(20)

Advertisement

Microservices Workshop - Craft Conference

  1. Microservices Workshop: Why, what, and how to get there Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures April 2016
  2. Agenda Workshop vs. Presentation & Introductions Faster Development Microservice Architectures What’s Missing? Migration and Simulation What’s Next? Hands-on
  3. Workshop vs. Presentation Questions at any time Interactive discussions Share your experiences Everyone’s voice should be heard PDF of slides: http://bit.ly/microservices-craft
  4. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Relationship with Cloud Vendors Previously: Netflix, eBay, Sun Microsystems, CCL, TCU London BSc Applied Physics
  5. Why am I here? %*&!” By Simon Wardley http://enterpriseitadoption.com/
  6. Why am I here? %*&!” By Simon Wardley http://enterpriseitadoption.com/ 2009
  7. Why am I here? %*&!” By Simon Wardley http://enterpriseitadoption.com/ 2009
  8. Why am I here? @adrianco’s job at the intersection of cloud and Enterprise IT, looking for disruption and opportunities. %*&!” By Simon Wardley http://enterpriseitadoption.com/ 20142009 Disruptions in 2016 coming from server- less computing and teraservices.
  9. Typical reactions to my Netflix talks…
  10. Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009
  11. Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010
  12. Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010 It only works for ‘Unicorns’ like Netflix” – 2011
  13. Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010 It only works for ‘Unicorns’ like Netflix” – 2011 “We’d like to do 
 that but can’t” – 2012
  14. Typical reactions to my Netflix talks… “You guys are crazy! Can’t believe it” – 2009 “What Netflix is doing won’t work” – 2010 It only works for ‘Unicorns’ like Netflix” – 2011 “We’d like to do 
 that but can’t” – 2012 “We’re on our way using Netflix OSS code” – 2013
  15. What I learned from my time at Netflix
  16. What I learned from my time at Netflix •Speed wins in the marketplace
  17. What I learned from my time at Netflix •Speed wins in the marketplace •Remove friction from product development
  18. What I learned from my time at Netflix •Speed wins in the marketplace •Remove friction from product development •High trust, low process, no hand-offs between teams
  19. What I learned from my time at Netflix •Speed wins in the marketplace •Remove friction from product development •High trust, low process, no hand-offs between teams •Freedom and responsibility culture
  20. What I learned from my time at Netflix •Speed wins in the marketplace •Remove friction from product development •High trust, low process, no hand-offs between teams •Freedom and responsibility culture •Don’t do your own undifferentiated heavy lifting
  21. What I learned from my time at Netflix •Speed wins in the marketplace •Remove friction from product development •High trust, low process, no hand-offs between teams •Freedom and responsibility culture •Don’t do your own undifferentiated heavy lifting •Use simple patterns automated by tooling
  22. What I learned from my time at Netflix •Speed wins in the marketplace •Remove friction from product development •High trust, low process, no hand-offs between teams •Freedom and responsibility culture •Don’t do your own undifferentiated heavy lifting •Use simple patterns automated by tooling •Self service cloud makes impossible things instant
  23. “You build it, you run it.” Werner Vogels 2006
  24. In 2014 Enterprises finally embraced public cloud and in 2015 began replacing entire datacenters.
  25. In 2014 Enterprises finally embraced public cloud and in 2015 began replacing entire datacenters. Oct 2014
  26. In 2014 Enterprises finally embraced public cloud and in 2015 began replacing entire datacenters. Oct 2014 Oct 2015
  27. In 2014 Enterprises finally embraced public cloud and in 2015 began replacing entire datacenters. Oct 2014 Oct 2015
  28. Key Goals of the CIO? Align IT with the business Develop products faster Try not to get breached
  29. Security Blanket Failure Insecure applications hidden behind firewalls make you feel safe until the breach happens… http://peanuts.wikia.com/wiki/Linus'_security_blanket
  30. What needs to change?
  31. Developer responsibilities: Faster, cheaper, safer
  32. “It isn't what we don't know that gives us trouble, it's what we know that ain't so.” Will Rogers
  33. Assumptions
  34. Optimizations
  35. Assumption: Process prevents problems
  36. Organizations build up slow complex “Scar tissue” processes
  37. "This is the IT swamp draining manual for anyone who is neck deep in alligators.” 1984 2014
  38. Product Development Processes
  39. Waterfall Product Development Business Need • Documents • Weeks Approval Process • Meetings • Weeks Hardware Purchase • Negotiations • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks
  40. Waterfall Product Development Hardware provisioning is undifferentiated heavy lifting – replace it with IaaS Business Need • Documents • Weeks Approval Process • Meetings • Weeks Hardware Purchase • Negotiations • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks
  41. Waterfall Product Development Hardware provisioning is undifferentiated heavy lifting – replace it with IaaS Business Need • Documents • Weeks Approval Process • Meetings • Weeks Hardware Purchase • Negotiations • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks IaaS Cloud
  42. Waterfall Product Development Hardware provisioning is undifferentiated heavy lifting – replace it with IaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Weeks Customer Feedback • It sucks! • Weeks
  43. Process Hand-Off Steps for Agile Product Manager Development Team QA Integration Team Operations Deploy Team BI Analytics Team
  44. IaaS Agile Product Development Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days
  45. IaaS Agile Product Development Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days etc…
  46. IaaS Agile Product Development Software provisioning is undifferentiated heavy lifting – replace it with PaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days etc…
  47. IaaS Agile Product Development Software provisioning is undifferentiated heavy lifting – replace it with PaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Deployment and Testing • Reports • Days Customer Feedback • It sucks! • Days PaaS Cloud etc…
  48. IaaS Agile Product Development Software provisioning is undifferentiated heavy lifting – replace it with PaaS Business Need • Documents • Weeks Software Development • Specifications • Weeks Customer Feedback • It sucks! • Days etc…
  49. Process for Continuous Delivery of Features on PaaS Product Manager Developer BI Analytics Team
  50. PaaS CD Feature Development Business Need • Discussions • Days Software Development • Code • Days Customer Feedback • Fix this Bit! • Hours etc…
  51. PaaS CD Feature Development Building your own business apps is undifferentiated heavy lifting – use SaaS Business Need • Discussions • Days Software Development • Code • Days Customer Feedback • Fix this Bit! • Hours etc…
  52. PaaS CD Feature Development Building your own business apps is undifferentiated heavy lifting – use SaaS Business Need • Discussions • Days Software Development • Code • Days Customer Feedback • Fix this Bit! • Hours SaaS/ BPaaS Cloud etc…
  53. PaaS CD Feature Development Building your own business apps is undifferentiated heavy lifting – use SaaS Business Need • Discussions • Days Customer Feedback • Fix this Bit! • Hours etc…
  54. SaaS Based Business Application Development Business Need •GUI Builder •Hours Customer Feedback •Fix this bit! •Seconds
  55. SaaS Based Business Application Development Business Need •GUI Builder •Hours Customer Feedback •Fix this bit! •Seconds and thousands more…
  56. Value Chain Mapping Simon Wardley http://blog.gardeviance.org/2014/11/how-to-get-to-strategy-in-ten-steps.html Related tools and training http://www.wardleymaps.com/
  57. Value Chain Mapping Simon Wardley http://blog.gardeviance.org/2014/11/how-to-get-to-strategy-in-ten-steps.html Related tools and training http://www.wardleymaps.com/ Your unique product - Agile
  58. Value Chain Mapping Simon Wardley http://blog.gardeviance.org/2014/11/how-to-get-to-strategy-in-ten-steps.html Related tools and training http://www.wardleymaps.com/ Your unique product - Agile Best of breed as a Service - Lean
  59. Value Chain Mapping Simon Wardley http://blog.gardeviance.org/2014/11/how-to-get-to-strategy-in-ten-steps.html Related tools and training http://www.wardleymaps.com/ Your unique product - Agile Undifferentiated utility suppliers - 6sigma Best of breed as a Service - Lean
  60. Observe Orient Decide Act Continuous Delivery
  61. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Measure Customers Continuous Delivery
  62. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point INNOVATION Measure Customers Continuous Delivery
  63. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis Model Hypotheses INNOVATION Measure Customers Continuous Delivery
  64. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis Model Hypotheses BIG DATA INNOVATION Measure Customers Continuous Delivery
  65. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Model Hypotheses BIG DATA INNOVATION Measure Customers Continuous Delivery
  66. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Model Hypotheses BIG DATA INNOVATION CULTURE Measure Customers Continuous Delivery
  67. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Incremental Features Automatic Deploy Launch AB Test Model Hypotheses BIG DATA INNOVATION CULTURE Measure Customers Continuous Delivery
  68. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Incremental Features Automatic Deploy Launch AB Test Model Hypotheses BIG DATA INNOVATION CULTURE CLOUD Measure Customers Continuous Delivery
  69. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Incremental Features Automatic Deploy Launch AB Test Model Hypotheses BIG DATA INNOVATION CULTURE CLOUD Measure Customers Continuous Delivery
  70. Observe Orient Decide Act Land grab opportunity Competitive Move Customer Pain Point Analysis JFDI Plan Response Share Plans Incremental Features Automatic Deploy Launch AB Test Model Hypotheses BIG DATA INNOVATION CULTURE CLOUD Measure Customers Continuous Delivery
  71. Breaking Down the SILOs
  72. Breaking Down the SILOs QA DBA Sys Adm Net Adm SAN Adm DevUX Prod Mgr
  73. Breaking Down the SILOs QA DBA Sys Adm Net Adm SAN Adm DevUX Prod Mgr Product Team Using Monolithic Delivery Product Team Using Monolithic Delivery
  74. Breaking Down the SILOs QA DBA Sys Adm Net Adm SAN Adm DevUX Prod Mgr Product Team Using Microservices Product Team Using Monolithic Delivery Product Team Using Microservices Product Team Using Microservices Product Team Using Monolithic Delivery
  75. Breaking Down the SILOs QA DBA Sys Adm Net Adm SAN Adm DevUX Prod Mgr Product Team Using Microservices Product Team Using Monolithic Delivery Platform TeamProduct Team Using Microservices Product Team Using Microservices Product Team Using Monolithic Delivery
  76. Breaking Down the SILOs QA DBA Sys Adm Net Adm SAN Adm DevUX Prod Mgr Product Team Using Microservices Product Team Using Monolithic Delivery Platform Team A P I Product Team Using Microservices Product Team Using Microservices Product Team Using Monolithic Delivery
  77. Breaking Down the SILOs QA DBA Sys Adm Net Adm SAN Adm DevUX Prod Mgr Product Team Using Microservices Product Team Using Monolithic Delivery Platform Team Re-Org from project teams to product teams A P I Product Team Using Microservices Product Team Using Microservices Product Team Using Monolithic Delivery
  78. Release Plan Developer Developer Developer Developer Developer QA Release Integration Ops Replace Old With New Release Monolithic service updates Works well with a small number of developers and a single language like php, java or ruby
  79. Release Plan Developer Developer Developer Developer Developer QA Release Integration Ops Replace Old With New Release Bugs Monolithic service updates Works well with a small number of developers and a single language like php, java or ruby
  80. Release Plan Developer Developer Developer Developer Developer QA Release Integration Ops Replace Old With New Release Bugs Bugs Monolithic service updates Works well with a small number of developers and a single language like php, java or ruby
  81. Use monolithic apps for small teams, simple systems and when you must, to optimize for efficiency and latency
  82. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Immutable microservice deployment scales, is faster with large teams and diverse platform components
  83. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Immutable microservice deployment scales, is faster with large teams and diverse platform components
  84. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Bugs Immutable microservice deployment scales, is faster with large teams and diverse platform components
  85. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Bugs Deploy Feature to Production Immutable microservice deployment scales, is faster with large teams and diverse platform components
  86. Configure Configure Developer Developer Developer Release Plan Release Plan Release Plan Deploy Standardized Services Standardized container deployment saves time and effort https://hub.docker.com
  87. Configure Configure Developer Developer Developer Release Plan Release Plan Release Plan Deploy Standardized Services Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Bugs Deploy Feature to Production Standardized container deployment saves time and effort https://hub.docker.com
  88. Developer Developer Run What You Wrote Developer Developer
  89. Developer Developer Run What You Wrote Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer
  90. DeveloperDeveloper Developer Run What You Wrote Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Monitoring Tools
  91. DeveloperDeveloper Developer Run What You Wrote Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  92. DeveloperDeveloper Developer Run What You Wrote Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Manager Manager Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  93. DeveloperDeveloper Developer Run What You Wrote Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Manager Manager VP Engineering Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  94. Non-Destructive Production Updates " “Immutable Code” Service Pattern " Existing services are unchanged, old code remains in service " New code deploys as a new service group " No impact to production until traffic routing changes " A|B Tests, Feature Flags and Version Routing control traffic " First users in the test cell are the developer and test engineers " A cohort of users is added looking for measurable improvement
  95. Deliver four features every four weeks Work In Progress = 4 Opportunity for bugs: 100% (baseline) Time to debug each: 100% (baseline)
  96. Deliver four features every four weeks Bugs! Which feature broke? Need more time to test! Extend release to six weeks? Work In Progress = 4 Opportunity for bugs: 100% (baseline) Time to debug each: 100% (baseline)
  97. Deliver four features every four weeks But: risk of bugs in delivery increases with interactions! Bugs! Which feature broke? Need more time to test! Extend release to six weeks? Work In Progress = 4 Opportunity for bugs: 100% (baseline) Time to debug each: 100% (baseline)
  98. Deliver four features every four weeks 16 16 16 But: risk of bugs in delivery increases with interactions! Bugs! Which feature broke? Need more time to test! Extend release to six weeks? Work In Progress = 4 Opportunity for bugs: 100% (baseline) Time to debug each: 100% (baseline)
  99. Deliver six features every six weeks
  100. Deliver six features every six weeks Work In Progress = 6 Individual bugs: 150% Interactions: 150%?
  101. Deliver six features every six weeks More features What broke? More interactions Even more bugs!! Work In Progress = 6 Individual bugs: 150% Interactions: 150%?
  102. 36 36 Deliver six features every six weeks More features What broke? More interactions Even more bugs!! Work In Progress = 6 Individual bugs: 150% Interactions: 150%?
  103. 36 36 Deliver six features every six weeks Risk of bugs in delivery increased to 225% of original! More features What broke? More interactions Even more bugs!! Work In Progress = 6 Individual bugs: 150% Interactions: 150%?
  104. 4 4 4 4 4 4 Deliver two features every two weeks Complexity of delivery decreased by 75% from original Fewer interactions Fewer bugs Better flow Less Work In Progress Work In Progress = 2 Opportunity for bugs: 50% Time to debug each: 50%
  105. Change One Thing at a Time! If it hurts, do it more often!
  106. What Happened? Rate of change increased Cost and size and risk of change reduced
  107. Low Cost of Change Using Docker Developers • Compile/Build • Seconds Extend container • Package dependencies • Seconds Deploy Container • Docker startup • Seconds
  108. Low Cost of Change Using Docker Fast tooling supports continuous delivery of many tiny changes Developers • Compile/Build • Seconds Extend container • Package dependencies • Seconds Deploy Container • Docker startup • Seconds
  109. Disruptor: Continuous Delivery with Containerized Microservices
  110. It’s what you know that isn’t so
  111. It’s what you know that isn’t so " Make your assumptions explicit
  112. It’s what you know that isn’t so " Make your assumptions explicit " Extrapolate trends to the limit
  113. It’s what you know that isn’t so " Make your assumptions explicit " Extrapolate trends to the limit " Listen to non-customers
  114. It’s what you know that isn’t so " Make your assumptions explicit " Extrapolate trends to the limit " Listen to non-customers " Follow developer adoption, not IT spend
  115. It’s what you know that isn’t so " Make your assumptions explicit " Extrapolate trends to the limit " Listen to non-customers " Follow developer adoption, not IT spend " Map evolution of products to services to utilities
  116. It’s what you know that isn’t so " Make your assumptions explicit " Extrapolate trends to the limit " Listen to non-customers " Follow developer adoption, not IT spend " Map evolution of products to services to utilities " Re-organize your teams for speed of execution
  117. Microservices
  118. A Microservice Definition Loosely coupled service oriented architecture with bounded contexts
  119. A Microservice Definition Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled
  120. A Microservice Definition Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled If you have to know too much about surrounding services you don’t have a bounded context. See the Domain Driven Design book by Eric Evans.
  121. Coupling Concerns http://en.wikipedia.org/wiki/Conway's_law "Conway’s Law - organizational coupling "Centralized Database Schemas "Enterprise Service Bus - centralized message queues "Inflexible Protocol Versioning
  122. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years
  123. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks
  124. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours
  125. Speeding Up The Platform Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours Lambda Deployments • Deploy in milliseconds • Live for seconds
  126. Speeding Up The Platform AWS Lambda is leading exploration of serverless architectures in 2016 Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours Lambda Deployments • Deploy in milliseconds • Live for seconds
  127. Separate Concerns with Microservices http://en.wikipedia.org/wiki/Conway's_law " Invert Conway’s Law – teams own service groups and backend stores " One “verb” per single function micro-service, size doesn’t matter " One developer independently produces a micro-service " Each micro-service is it’s own build, avoids trunk conflicts " Deploy in a container: Tomcat, AMI or Docker, whatever… " Stateless business logic. Cattle, not pets. " Stateful cached data access layer using replicated ephemeral instances
  128. Inspiration
  129. http://www.infoq.com/presentations/Twitter-Timeline-Scalability http://www.infoq.com/presentations/twitter-soa http://www.infoq.com/presentations/Zipkin http://www.infoq.com/presentations/scale-gilt Go-Kit https://www.youtube.com/watch?v=aL6sd4d4hxk http://www.infoq.com/presentations/circuit-breaking-distributed-systems https://speakerdeck.com/mattheath/scaling-micro-services-in-go-highload-plus-plus-2014 State of the Art in Web Scale Microservice Architectures AWS Re:Invent : Asgard to Zuul https://www.youtube.com/watch?v=p7ysHhs5hl0 Resiliency at Massive Scale https://www.youtube.com/watch?v=ZfYJHtVL1_w Microservice Architecture https://www.youtube.com/watch?v=CriDUYtfrjs New projects for 2015 and Docker Packaging https://www.youtube.com/watch?v=hi7BDAtjfKY Spinnaker deployment pipeline https://www.youtube.com/watch?v=dwdVwE52KkU http://www.infoq.com/presentations/spring-cloud-2015
  130. Microservice Architectures ConfigurationTooling Discovery Routing Observability Development: Languages and Container Operational: Orchestration and Deployment Infrastructure Datastores Policy: Architectural and Security Compliance
  131. Microservices Edda Archaius Configuration Spinnaker SpringCloud Tooling Eureka Prana Discovery Denominator Zuul Ribbon Routing Hystrix Pytheus Atlas Observability Development using Java, Groovy, Scala, Clojure, Python with AMI and Docker Containers Orchestration with Autoscalers on AWS, Titus exploring Mesos & ECS for Docker Ephemeral datastores using Dynomite, Memcached, Astyanax, Staash, Priam, Cassandra Policy via the Simian Army - Chaos Monkey, Chaos Gorilla, Conformity Monkey, Security Monkey
  132. Cloud Native Storage Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays
  133. Cloud Native Storage Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups
  134. Cloud Native Storage Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups SSDs inside arrays disrupt incumbent suppliers
  135. Cloud Native Storage Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups SSDs inside ephemeral instances disrupt an entire industry SSDs inside arrays disrupt incumbent suppliers
  136. Cloud Native Storage Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups SSDs inside ephemeral instances disrupt an entire industry SSDs inside arrays disrupt incumbent suppliers NetflixOSS Uses Priam to create Cassandra clusters in minutes
  137. Twitter Microservices Decider ConfigurationTooling Finagle Zookeeper Discovery Finagle Netty Routing Zipkin Observability Scala with JVM Container Orchestration using Aurora deployment in datacenters using Mesos Custom Cassandra-like datastore: Manhattan
  138. Twitter Microservices Decider ConfigurationTooling Finagle Zookeeper Discovery Finagle Netty Routing Zipkin Observability Scala with JVM Container Orchestration using Aurora deployment in datacenters using Mesos Custom Cassandra-like datastore: Manhattan Focus on efficient datacenter deployment at scale
  139. Gilt Microservices Decider Configuration Ion Cannon SBT Rake Tooling Finagle Zookeeper Discovery Akka Finagle Netty Routing Zipkin Observability Scala and Ruby with Docker Containers Deployment on AWS Datastores per Microservice using MongoDB, Postgres, Voldemort
  140. Gilt Microservices Decider Configuration Ion Cannon SBT Rake Tooling Finagle Zookeeper Discovery Akka Finagle Netty Routing Zipkin Observability Scala and Ruby with Docker Containers Deployment on AWS Datastores per Microservice using MongoDB, Postgres, Voldemort Focus on fast development with Scala and Docker
  141. Hailo Microservices Configuration Hubot Janky Jenkins Tooling go-platform Discovery go-platform RabbitMQ Routing Observability Go using AMI Container and Docker Deployment on AWS Deployment on AWS
  142. Hailo Microservices Configuration Hubot Janky Jenkins Tooling go-platform Discovery go-platform RabbitMQ Routing Observability Go using AMI Container and Docker Deployment on AWS Deployment on AWS See: go-micro and https://github.com/peterbourgon/gokit
  143. Next Generation Applications Fill in the gaps, rapidly evolving ecosystem choices Archaius LaunchDarkly Configuration Docker CaaS Spinnaker Tooling Etcd Eureka Consul Discovery Compose Calico Weave Routing Zipkin Prometheus Hystrix Observability Development: Components assembled from Docker Hub as a composable “app store” Operational: Mesos, Kubernetes, Swarm, ECS etc. across public and private clouds Datastores: Distributed Ephemeral, Orchestrated or DBaaS Policy: Architectural and security compliance, Cloud Foundry/Apcera for low trust teams
  144. @adrianco In Search of Segmentation Ops Dev Datacenters/AWS Accounts IAM/AD/LDAP Roles VPC/VLAN Networks Security Groups/Hypervisor IPtables/Calico Policy Docker Links/Weave Overlay
  145. @adrianco Hierarchical Segmentation B CA B C E FD E F Security Group for team X Security Group for team Y VPC Z - Manage a small number of large network spaces D X An AWS oriented example… AWS Account - Manage across multiple accounts containers and links
  146. @adrianco What’s Often Missing? Failure injection testing Versioning, routing Binary protocols and interfaces Timeouts and retries Denormalized data models Monitoring, tracing Simplicity through symmetry
  147. @adrianco Failure Injection Testing Netflix Chaos Monkey. Simian Army, FIT and Gremlin http://techblog.netflix.com/2011/07/netflix-simian-army.html http://techblog.netflix.com/2014/10/fit-failure-injection-testing.html http://techblog.netflix.com/2016/01/automated-failure-testing.html
  148. " Chaos Monkey - enforcing stateless business logic " Chaos Gorilla - enforcing zone isolation/replication " Chaos Kong - enforcing region isolation/replication " Security Monkey - watching for insecure configuration settings " Latency Monkey & FIT - inject errors to enforce robust dependencies " See over 100 NetflixOSS projects at netflix.github.com " Get “Technical Indigestion” reading techblog.netflix.com Trust with Verification
  149. @adrianco Benefits of version aware routing Immediately and safely introduce a new version Canary test in production Use feature flags n Route clients to a version so they can’t get disrupted Change client or dependencies but not both at once Eventually remove old versions Incremental or infrequent “break the build” garbage collection
  150. @adrianco Versioning, Routing Version numbering: Interface.Feature.Bugfix V1.2.3 to V1.2.4 - Canary test then remove old version V1.2.x to V1.3.x - Canary test then remove or keep both Route V1.3.x clients to new version to get new feature Remove V1.2.x only after V1.3.x is found to work for V1.2.x clients V1.x.x to V2.x.x - Route clients to specific versions Remove old server version when all old clients are gone
  151. @adrianco Protocols Measure serialization, transmission, deserialization costs “Sending a megabyte of XML between microservices will make you sad…” Use Thrift, Protobuf/gRPC, Avro, SBE internally Use JSON for external/public interfaces https://github.com/real-logic/simple-binary-encoding
  152. @adrianco Interfaces When you build a service, build a “driver” client for it Reference implementation error handling and serialization Release automation stress test using client Validate that service interface is usable! Minimize additional dependencies Swagger - OpenAPI Specification Datawire Quark adds behaviors to API spec
  153. @adrianco Interfaces
  154. @adrianco Interfaces Client Code Object Model
  155. @adrianco Interfaces Service Code Client Code Object Model Object Model
  156. @adrianco Interfaces Service Code Client Code Object Model Object Model Cache Code Object Model Decoupled object models
  157. @adrianco Interfaces Service Code Client Code Object Model Service Driver Service Handler Object Model Cache Code Object Model Decoupled object models
  158. @adrianco Interfaces Service Code Client Code Object Model Cache Driver Service Driver Service Handler Object Model Cache Code Cache Handler Object Model Decoupled object models
  159. @adrianco Interfaces Service Code Client Code Object Model Cache Driver Service Driver Platform Platform Service Handler Object Model Cache Code Platform Cache Handler Object Model Decoupled object models
  160. @adrianco Interfaces Service Code Client Code Object Model Cache Driver Service Driver Platform Platform Service Handler Object Model Cache Code Platform Cache Handler Object Model Versioned dependency interfacesDecoupled object models
  161. @adrianco Interfaces Service Code Client Code Object Model Cache Driver Service Driver Platform Platform Service Handler Object Model Cache Code Platform Cache Handler Object Model Versioned dependency interfaces Versioned platform interface Decoupled object models
  162. @adrianco Interfaces Service Code Client Code Object Model Cache Driver Service Driver Platform Platform Service Handler Object Model Cache Code Platform Cache Handler Object Model Versioned dependency interfaces Versioned platform interface Decoupled object models Versioned routing
  163. @adrianco Interface Version Pinning Change one thing at a time! Pin the version of everything else Incremental build/test/deploy pipeline Deploy existing app code with new platform Deploy existing app code with new dependencies Deploy new app code with pinned platform/dependencies
  164. @adrianco Timeouts and Retries Connection timeout vs. request timeout confusion Usually setup incorrectly, global defaults Systems collapse with “retry storms” Timeouts too long, too many retries Services doing work that can never be used
  165. @adrianco Connections and Requests TCP makes a connection, HTTP makes a request HTTP hopefully reuses connections for several requests Both have different timeout and retry needs! TCP timeout is purely a property of one network latency hop HTTP timeout depends on the service and its dependencies connection path request path
  166. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  167. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  168. @adrianco Timeouts and Retries Edge Service Good Service Good Service Bad config: Every service defaults to 2 second timeout, two retries Edge Service not responding Overloaded service not responding Failed Service If anything breaks, everything upstream stops responding Retries add unproductive work
  169. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service
  170. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  171. @adrianco Timeouts and Retries Bad config: Every service defaults to 2 second timeout, two retries Edge service responds slowly Overloaded service Partially failed service First request from Edge timed out so it ignores the successful response and keeps retrying. Middle service load increases as it’s doing work that isn’t being consumed
  172. @adrianco Timeout and Retry Fixes Cascading timeout budget Static settings that decrease from the edge or dynamic budget passed with request How often do retries actually succeed? Don’t ask the same instance the same thing Only retry on a different connection
  173. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, one retry Failed Service
  174. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, one retry Failed Service 3s 1s 1s Fast fail response after 2s Upstream timeout must always be longer than total downstream timeout * retries delay No unproductive work while fast failing
  175. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service
  176. @adrianco Timeouts and Retries Edge Service Good Service Budgeted timeout, failover retry Failed Service 3s 1s For replicated services with multiple instances never retry against a failed instance No extra retries or unproductive work Good Service Successful response delayed 1s
  177. @adrianco Manage Inconsistency ACM Paper: "The Network is Reliable" Distributed systems are inconsistent by nature Clients are inconsistent with servers Most caches are inconsistent Versions are inconsistent Get over it and Deal with it
  178. @adrianco Denormalized Data Models Any non-trivial organization has many databases Cross references exist, inconsistencies exist Microservices work best with individual simple stores Scale, operate, mutate, fail them independently NoSQL allows flexible schema/object versions
  179. @adrianco Denormalized Data Models Build custom cross-datasource check/repair processes Ensure all cross references are up to date Immutability Changes Everything http://highscalability.com/blog/2015/1/26/paper-immutability-changes-everything-by-pat-helland.html Memories, Guesses and Apologies https://blogs.msdn.microsoft.com/pathelland/2007/05/15/memories-guesses-and-apologies/
  180. Cloud Native Monitoring and Microservices
  181. Cloud Native Microservices " High rate of change Code pushes can cause floods of new instances and metrics Short baseline for alert threshold analysis – everything looks unusual " Ephemeral Configurations Short lifetimes make it hard to aggregate historical views Hand tweaked monitoring tools take too much work to keep running " Microservices with complex calling patterns End-to-end request flow measurements are very important Request flow visualizations get overwhelmed
  182. Microservice Based Architectures See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  183. Continuous Delivery and DevOps " Changes are smaller but more frequent " Individual changes are more likely to be broken " Changes are normally deployed by developers " Feature flags are used to enable new code " Instant detection and rollback matters much more
  184. Whoops! I didn’t mean that! Reverting…
 
 Not cool if it takes 5 minutes to see it failed and 5 more to see a fix
 No-one notices if it only takes 5 seconds to detect and 5 to see a fix
  185. NetflixOSS Hystrix/Turbine Circuit Breaker http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html
  186. NetflixOSS Hystrix/Turbine Circuit Breaker http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html
  187. Low Latency SaaS Based Monitors https://www.datadoghq.com/ http://www.instana.com/ www.bigpanda.io www.vividcortex.com signalfx.com wavefront.com sysdig.com See www.battery.com for a list of portfolio investments
  188. A Tragic Quadrant Ability to scale Ability to handle rapidly changing microservices In-house tools at web scale companies Most current monitoring & APM tools Next generation APM Next generation Monitoring Datacenter Cloud Containers 100s 1,000s 10,000s 100,000s Lambda
  189. A Tragic Quadrant Ability to scale Ability to handle rapidly changing microservices In-house tools at web scale companies Most current monitoring & APM tools Next generation APM Next generation Monitoring Datacenter Cloud Containers 100s 1,000s 10,000s 100,000s Lambda
  190. Metric to display latency needs to be less than human attention span (~10s)
  191. Challenges for Microservice Platforms
  192. Managing Scale
  193. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  194. Flow
  195. Some tools can show the request flow across a few services
  196. Interesting architectures have a lot of microservices! Flow visualization is a big challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  197. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools Code: github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 See for yourself: http://simianviz.surge.sh Follow @simianviz for updates ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones Denominator DNS Endpoint
  198. Spigo Nanoservice Structure func Start(listener chan gotocol.Message) { ... for { select { case msg := <-listener: flow.Instrument(msg, name, hist) switch msg.Imposition { case gotocol.Hello: // get named by parent ... case gotocol.NameDrop: // someone new to talk to ... case gotocol.Put: // upstream request handler ... outmsg := gotocol.Message{gotocol.Replicate, listener, time.Now(), msg.Ctx.NewParent(), msg.Intention} flow.AnnotateSend(outmsg, name) outmsg.GoSend(replicas) } case <-eurekaTicker.C: // poll the service registry ... } } } Skeleton code for replicating a Put message Instrument incoming requests Instrument outgoing requests update trace context
  199. Flow Trace Records riak2 us-east-1 zoneC riak9 us-west-2 zoneA Put s896 Replicate riak3 us-east-1 zoneA riak8 us-west-2 zoneC riak4 us-east-1 zoneB riak10 us-west-2 zoneB us-east-1.zoneC.riak2 t98p895s896 Put us-east-1.zoneA.riak3 t98p896s908 Replicate us-east-1.zoneB.riak4 t98p896s909 Replicate us-west-2.zoneA.riak9 t98p896s910 Replicate us-west-2.zoneB.riak10 t98p910s912 Replicate us-west-2.zoneC.riak8 t98p910s913 Replicate staash us-east-1 zoneC s910 s908s913 s909s912 Replicate Put
  200. Open Zipkin A common format for trace annotations A Java tool for visualizing traces Standardization effort to fold in other formats Driven by Adrian Cole (currently at Pivotal) Extended to load Spigo generated trace files
  201. Zipkin Trace Dependencies
  202. Zipkin Trace Dependencies
  203. Trace for one Spigo Flow
  204. Definition of an architecture { "arch": "lamp", "description":"Simple LAMP stack", "version": "arch-0.0", "victim": "webserver", "services": [ { "name": "rds-mysql", "package": "store", "count": 2, "regions": 1, "dependencies": [] }, { "name": "memcache", "package": "store", "count": 1, "regions": 1, "dependencies": [] }, { "name": "webserver", "package": "monolith", "count": 18, "regions": 1, "dependencies": ["memcache", "rds-mysql"] }, { "name": "webserver-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["webserver"] }, { "name": "www", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["webserver-elb"] } ] } Header includes chaos monkey victim New tier name Tier package 0 = non Regional Node count List of tier dependencies See for yourself: http://simianviz.surge.sh/lamp
  205. Running Spigo $ ./spigo -a lamp -j -d 2 2016/01/26 23:04:05 Loading architecture from json_arch/lamp_arch.json 2016/01/26 23:04:05 lamp.edda: starting 2016/01/26 23:04:05 Architecture: lamp Simple LAMP stack 2016/01/26 23:04:05 architecture: scaling to 100% 2016/01/26 23:04:05 lamp.us-east-1.zoneB.eureka01....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneA.eureka00....eureka.eureka: starting 2016/01/26 23:04:05 lamp.us-east-1.zoneC.eureka02....eureka.eureka: starting 2016/01/26 23:04:05 Starting: {rds-mysql store 1 2 []} 2016/01/26 23:04:05 Starting: {memcache store 1 1 []} 2016/01/26 23:04:05 Starting: {webserver monolith 1 18 [memcache rds-mysql]} 2016/01/26 23:04:05 Starting: {webserver-elb elb 1 0 [webserver]} 2016/01/26 23:04:05 Starting: {www denominator 0 0 [webserver-elb]} 2016/01/26 23:04:05 lamp.*.*.www00....www.denominator activity rate 10ms 2016/01/26 23:04:06 chaosmonkey delete: lamp.us-east-1.zoneC.webserver02....webserver.monolith 2016/01/26 23:04:07 asgard: Shutdown 2016/01/26 23:04:07 lamp.us-east-1.zoneB.eureka01....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneA.eureka00....eureka.eureka: closing 2016/01/26 23:04:07 lamp.us-east-1.zoneC.eureka02....eureka.eureka: closing 2016/01/26 23:04:07 spigo: complete 2016/01/26 23:04:07 lamp.edda: closing -a architecture lamp -j graph json/lamp.json -d run for 2 seconds
  206. Riak IoT Architecture { "arch": "riak", "description":"Riak IoT ingestion example for the RICON 2015 presentation", "version": "arch-0.0", "victim": "", "services": [ { "name": "riakTS", "package": "riak", "count": 6, "regions": 1, "dependencies": ["riakTS", "eureka"]}, { "name": "ingester", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakTS"]}, { "name": "ingestMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["ingester"]}, { "name": "riakKV", "package": "riak", "count": 3, "regions": 1, "dependencies": ["riakKV"]}, { "name": "enricher", "package": "staash", "count": 6, "regions": 1, "dependencies": ["riakKV", "ingestMQ"]}, { "name": "enrichMQ", "package": "karyon", "count": 3, "regions": 1, "dependencies": ["enricher"]}, { "name": "analytics", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingester"]}, { "name": "analytics-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["analytics"]}, { "name": "analytics-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["analytics-elb"]}, { "name": "normalization", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["enrichMQ"]}, { "name": "iot-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["normalization"]}, { "name": "iot-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["iot-elb"]}, { "name": "stream", "package": "karyon", "count": 6, "regions": 1, "dependencies": ["ingestMQ"]}, { "name": "stream-elb", "package": "elb", "count": 0, "regions": 1, "dependencies": ["stream"]}, { "name": "stream-api", "package": "denominator", "count": 0, "regions": 0, "dependencies": ["stream-elb"]} ] } New tier name Tier package Node count List of tier dependencies 0 = non Regional
  207. Single Region Riak IoT See for yourself: http://simianviz.surge.sh/riak
  208. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint See for yourself: http://simianviz.surge.sh/riak
  209. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Load Balancer Load Balancer See for yourself: http://simianviz.surge.sh/riak
  210. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  211. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  212. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Analytics Service See for yourself: http://simianviz.surge.sh/riak
  213. Single Region Riak IoT IoT Ingestion Endpoint Stream Endpoint Analytics Endpoint Load Balancer Normalization Services Enrich Message Queue Riak KV Enricher Services Ingest Message Queue Load Balancer Load Balancer Stream Service Riak TS Analytics Service Ingester Service See for yourself: http://simianviz.surge.sh/riak