Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Observability driven development

120 views

Published on

Techorama talk about #Observability, Building resilient applications, #DevOps

Published in: Technology
  • Be the first to comment

Observability driven development

  1. 1. OBSERVABILITY DRIVEN DEVELOPMENT GEERT VAN DER CRUIJSEN @GEERTVDC
  2. 2. GEERT VAN DER CRUIJSEN @GEERTVDC CLOUDNATIVEARCHITECT FULLCYCLEDEVELOPERDEVOPSCOACH
  3. 3. I HAVE TO MAKE A CONFESSION @GEERTVDC
  4. 4. I HAVE TO MAKE A CONFESSION I TEST IN PRODUCTION @GEERTVDC
  5. 5. I TEST IN PRODUCTION I’M NOT LIKE THIS GUY THOUGH @GEERTVDC
  6. 6. TODAY’S PREACH YOU SHOULD TEST IN PRODUCTION TOO @GEERTVDC
  7. 7. YOU SHOULD TEST IN PRODUCTION TOO STOP BEING AFRAID OF PRODUCTION! @GEERTVDC
  8. 8. WHO’S DOING AGILE OR DEVOPS? @GEERTVDC
  9. 9. WHO’S DOING AGILE OR DEVOPS? COMMON AGILE / DEVOPS MISTAKES @GEERTVDC
  10. 10. FOCUS ON SPEED? @GEERTVDC
  11. 11. DO YOU WANT FAST WHEN YOU’RE NOT GOING IN THE RIGHT DIRECTION? @GEERTVDC
  12. 12. TEST IN PRODUCTION USER BEHAVIOR A/B TESTING EXPERIMENTS @GEERTVDC
  13. 13. BEING ABLE TO BRAKE AND STEER THAT IS WHAT MAKES YOU GO FASTER! @GEERTVDC
  14. 14. DEVOPS IS THE UNION OF PEOPLE, PROCESS, AND PRODUCTS TO ENABLE CONTINUOUS DELIVERY OF VALUE TO OUR END USERS. DONOVAN BROWN @GEERTVDC
  15. 15. DEVOPS IS THE UNION OF PEOPLE, PROCESS, AND PRODUCTS TO ENABLE CONTINUOUS DELIVERY OF VALUE TO OUR END USERS. DONOVAN BROWN @GEERTVDC
  16. 16. VALUE IS ONLY VALUE WHEN IT’S RUNNING IN PRODUCTION @GEERTVDC
  17. 17. VALUE IS ONLY VALUE WHEN IT’S RUNNING IN PRODUCTION @GEERTVDC
  18. 18. TEST IN PRODUCTION CANARY RELEASING RING BASED DEPLOYMENTS MULTI REGION CHAOS ENGINEERING SHADOW TESTING@GEERTVDC
  19. 19. BUT I USE STAGING?@GEERTVDC
  20. 20. BUT I USE STAGING? DOES STAGING HAVE REAL DATA? DOES STAGING HAVE REAL USERS? DOES STAGING REPRESENT PRODUCTION ENOUGH? HOW MUCH TIME DO YOU SPEND ON STAGING?
  21. 21. WHAT IS KEY TO TESTING ON PROD? OBSERVABILITY @GEERTVDC
  22. 22. OBSERVABILITY “OBSERVABILITY IS A MEASURE OF HOW WELL INTERNAL STATES OF A SYSTEM CAN BE INFERRED FROM KNOWLEDGE OF ITS EXTERNAL OUTPUTS” CONTROL THEORY @GEERTVDC
  23. 23. WHAT IS THE DIFFERENCE WITH MONITORING? @GEERTVDC
  24. 24. MONITORING KNOWN UNKNOWNS OBSERVABILITY UNKNOWN UNKNOWNS @GEERTVDC
  25. 25. COMPLEX APPLICATION LANDSCAPES DISTRIBUTED SYSTEMS – MICROSERVICES – CLOUD
  26. 26. “IN A COMPLEX LANDSCAPE YOUR APPLICATION IS NEVER FULLY UP” @GEERTVDC
  27. 27. MICROSERVICES TRADITIONAL MONITORING TOOLS ARE DEAD @GEERTVDC
  28. 28. MEASURE USER IMPACT @GEERTVDC
  29. 29. MEASURE USER IMPACT https://medium.com/netflix-techblog/sps-the-pulse-of-netflix-streaming-ae4db0e05f8a @GEERTVDC
  30. 30. RELIABILITY AVAILABILITY LATENCY THROUGHPUT CORRECTNESS FRESHNESS COVERAGE QUALITY DURABILITY RELIABILITY @GEERTVDC
  31. 31. FAIL OPEN PARTIAL FAILURE MODE @GEERTVDC
  32. 32. OBSERVABILITY IS THE KEY TO SOFTWARE OWNERSHIP @GEERTVDC
  33. 33. WE’VE TAUGHT OPS TO DEV SOURCE CONTROL INFRASTRUCTURE AS CODE AUTOMATION SCRIPTING @GEERTVDC
  34. 34. TIME HAS COME DEVS GET PROD ACCESS @GEERTVDC
  35. 35. TIME HAS COME DEVS GET PROD ACCESS DEVS TAKE OWNERSHIP @GEERTVDC
  36. 36. TIME HAS COME DEVS GET PROD ACCESS DEVS TAKE OWNERSHIP DEVS TAKE ON CALL! @GEERTVDC
  37. 37. DEVOPS CYCLE @GEERTVDC
  38. 38. DEVOPS CYCLE @GEERTVDC
  39. 39. BUSINESS + DEV IT OPERATIONS @GEERTVDC
  40. 40. BUSINESS + DEV IT OPERATIONS IMPROVE THE COMPANY @GEERTVDC
  41. 41. OBSERVABILITY CONNECT DEV TO BUSINESS OBSERVABILITY CONNECT DEV TO OPERATIONS @GEERTVDC
  42. 42. 3 PILLARS OF OBSERVABILITY @GEERTVDC
  43. 43. 3 PILLARS OF OBSERVABILITY LOGS METRICS TRACES @GEERTVDC
  44. 44. LOGGING EXAMPLE: REQUEST DURATION SERVICE REQUEST X FOR USER Y TOOK 50 MILLISECONDS @GEERTVDC
  45. 45. LOGGING EASY TO GENERATE, HARD TO QUERY? @GEERTVDC
  46. 46. STRUCTURED LOGGING Log.Information( “Request by {User} took {Duration}", user, duration); Log.Information(“Request by userA took 35ms"); FROM TO @GEERTVDC
  47. 47. STRUCTURED LOGGING GENERATE LOGS SERILOG APPLICATION INSIGHTS NLOG @GEERTVDC
  48. 48. STRUCTURED LOGGING GENERATE LOGS STORE & QUERY LOGS AZURE LOG ANALYTICS SERILOG APPLICATION INSIGHTS NLOG @GEERTVDC
  49. 49. LOGGING SHOULD YOU SAMPLE? STORAGE == MONEY AUDIT LOGS DO NOT SAMPLE OPERATIONAL LOGS DO SAMPLE DYNAMIC SAMPLING @GEERTVDC
  50. 50. METRICS AGGREGATE INFORMATION INTO TIME SERIES CREATE REAL TIME GRAPHS OR HISTOGRAPHS CHEAPER TO STORE @GEERTVDC
  51. 51. METRICS EXAMPLE: REQUEST DURATION 50 MILLISECONDS REQUEST IS 15 MILLISECONDS HIGHER THAN AVERAGE @GEERTVDC
  52. 52. METRICS EXAMPLE: REQUEST DURATION 50 MILLISECONDS REQUEST IS 15 MILLISECONDS HIGHER THAN AVERAGE IN EDE ON MONDAYS PEOPLE WHO BOUGHT PRODUCT Y @GEERTVDC
  53. 53. DISTRIBUTED TRACING EXAMPLE: REQUEST DURATION WHY DID THIS REQUEST TAKE 50 MILLISECONDS -> IT CALLED DB, OTHER SERVICES? @GEERTVDC
  54. 54. DISTRIBUTED TRACING APPLICATION FLOW FROM FRONT TO BACK USER SESSION TRANSACTION AMOUNT OF DATA IN MICROSERVICE LANDSCAPE? @GEERTVDC
  55. 55. @GEERTVDC DISTRIBUTED TRACING
  56. 56. WHAT TO MEASURE? USE RED @GEERTVDC FOCUS ON YOUR USERS LOG ALL USER EVENTS
  57. 57. USE RED UTILIZATION SATURATION ERROR RATE RESOURCE SCOPE @GEERTVDC
  58. 58. USE RED UTILIZATION SATURATION ERROR RATE RATE ERRORS DURATION RESOURCE SCOPE REQUEST SCOPE @GEERTVDC
  59. 59. FEATURE FLAGS If(_featureFlag.IsEnabled(“NewCheckoutFlow”) { log.Information(“NewCheckoutFlow feature used”); ExecuteNewCheckoutFlow(); } else { log.Information(“LegacyCheckout feature used”); ExecuteLegacyCheckoutFlow(); } @GEERTVDC
  60. 60. FEATURE FLAGS
  61. 61. FEATURE FLAGS INITIAL DEPLOYMENT
  62. 62. FEATURE FLAGS INITIAL DEPLOYMENT BUG FOUND
  63. 63. FEATURE FLAGS INITIAL DEPLOYMENT BUG FOUND SOLVED THE BUG
  64. 64. FEATURE FLAGS INITIAL DEPLOYMENT BUG FOUND SOLVED THE BUG ROLL OUT TO MORE USERS
  65. 65. FEATURE FLAGS INITIAL DEPLOYMENT BUG FOUND SOLVED THE BUG ROLL OUT TO MORE USERS REMOVE FEATURE FLAG
  66. 66. EXPERIMENT IN PRODUCTION public bool CanAccess(IUser user) { return Scientist.Science<bool>("widget-permissions", experiment => { experiment.Use(() => IsCollaborator(user)); // old way experiment.Try(() => HasAccess(user)); // new way }); // returns the control value } SCIENTIST.NET @GEERTVDC https://github.com/scientistproject/Scientist.net
  67. 67. FROM OBSERVABILITY TO OBSERVABILITY DRIVEN DEVELOPMENT @GEERTVDC
  68. 68. TDD WRITE TESTS PASS TESTS REFACTOR @GEERTVDC
  69. 69. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE TDD @GEERTVDC
  70. 70. ODDOBSERVABILITY DRIVEN DEVELOPMENT DEFINE EXPECTED OUTCOME MEASURE THE OUTCOME CHANGE FEATURE & KEEP MEASURING @GEERTVDC
  71. 71. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE ODD TDD WHAT IS THE USER IMPACT? @GEERTVDC
  72. 72. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE ODD TDD WHAT IS THE USER IMPACT? IS THE FEATURE BEHAVING LIKE WE EXPECTED?@GEERTVDC
  73. 73. PLAN DESIGN DEVELOP TEST DEPLOY OPERATE ODD TDD WHAT IS THE USER IMPACT? IS THE FEATURE BEHAVING LIKE WE EXPECTED? DEPLOYMENT FEEDBACK @GEERTVDC
  74. 74. KNOWING HOW OUR SYSTEM OPERATES SHOULD BE IN OUR SYSTEM AS DEVELOPERS WHAT IS NORMAL? RELEASE GATES TO NEXT STAGE? @GEERTVDC
  75. 75. SLI SLO SLA @GEERTVDC
  76. 76. SLI SLO SLA SERVICE LEVEL INDICATOR SERVICE LEVEL OBJECTIVE SERVICE LEVEL AGREEMENT @GEERTVDC
  77. 77. SLI SERVICE LEVEL INDICATOR QUANTITATIVE MEASURE FOR YOUR SERVICE AVAILABILITY ERROR RATE DURATION LATENCY FRESHNESS @GEERTVDC
  78. 78. SLO SERVICE LEVEL OBJECTIVE TARGET MEASURE FOR A SERVICE MEASURED BY SLIS AVAILABILITY OF 99.9% FOR LAST 30 DAYS @GEERTVDC
  79. 79. SLA SERVICE LEVEL AGREEMENT CONTRACT WITH USERS WITH CONSEQUENSES WHEN MISSING YOUR SLO 10% DISCOUNT FOR EACH 0.1% BELOW AVAILABILITY SLO @GEERTVDC
  80. 80. HOW TO DO THIS IN PRACTICE? @GEERTVDC
  81. 81. HOW TO DO THIS IN PRACTICE? DEFINE AN SLO BUILD INDICATORS BY LOGGING / METRICS BUILD A DASHBOARD – START MEASURING MAKE CHOICES BASED ON SERVICE LEVEL LEAVE SLA PART FOR SALES PEOPLE
  82. 82. MAKE IT VISIBLE@GEERTVDC
  83. 83. MAKE IT VISIBLE SLO AVAILABILITY 99.9954% @GEERTVDC
  84. 84. MAKE IT VISIBLE SLO AVAILABILITY 99.9954% RING 0 98% RING 1 99.91% RING 2 100% @GEERTVDC
  85. 85. MAKE IT VISIBLE SLO AVAILABILITY 99.9954% RING 0 98% RING 1 99.91% RING 2 100% USER SIGN UP FLOW – 100% CHECKOUT – 99.91% SEARCH – 98% @GEERTVDC
  86. 86. MAKE IT VISIBLE SLO AVAILABILITY 99.9954% RING 0 98% RING 1 99.91% RING 2 100% USER SIGN UP FLOW – 100% CHECKOUT – 99.91% SEARCH – 98% CLIENT A - USER SIGN UP FLOW – 100% CLIENT A - CHECKOUT – 99.91% CLIENT A - SEARCH – 90%
  87. 87. TAKEAWAYS START SMALL AT KEY AREAS OF YOUR APP EXPLORE TOOLS EMBRACE TESTING ON PROD! FOCUS ON CUSTOMERS TAKE OWNERSHIP OF CODE @GEERTVDC
  88. 88. @GEERTVDC
  89. 89. GEERT VAN DER CRUIJSEN @GEERTVDC THANK YOU! MOBILEFIRSTCLOUDFIRST.NET

×