AWS Webinar - Measuring Your Application Performance and Health

1,575 views

Published on

AWS Webinar - Measuring and monitoring application performance and health

Published in: Technology

AWS Webinar - Measuring Your Application Performance and Health

  1. 1. AWS$201$ Measuring$Your$Applica6on$ Performance$and$Health$ Markku$Lepistö$A$Technology$Evangelist$ @markkulepisto$
  2. 2. Housekeeping$ • Presenta6on$~40mins$ • Post$Ques6ons$Online$ • Q&A$at$the$end$using$the$online$chat$ • Reminder$–$Fill$in$the$survey!$ $
  3. 3. Why monitor?
  4. 4. Without Instrumentation You Are Flying Blind
  5. 5. Actionable insights of Historical, Current, and Predicted system state Data-driven decisions Availability Performance Cost-optimization Release speed & quality … Instrumentation Gives You
  6. 6. What to monitor?
  7. 7. Business KPIs Transactions total Customer QoS Customer QoE Revenue Cost … Operational KPIs Transaction – success & error rate, latency Throughput Load - system, service, node, component Health Availability … KPI = Key Performance Indicator, i.e metric
  8. 8. What are we actually measuring? System Inputs, State Changes and Outputs delta
  9. 9. What causes system changes? Inputs (customer traffic) Code changes Manual operations (Ops ! Opps!) Automated operations (Complex Adaptive System) OS packages & patches Dependent services Underlying infrastructure delta
  10. 10. When and where should we measure?
  11. 11. Everywhere - All the Time!
  12. 12. “Big$Data$is$what$happened$when$the$ cost$of$storing$informa6on$became$less$ than$the$cost$of$making$the$decision$to$ throw$it$away”! George!Dyson,!! Author!of!“The!Digital!Universe”!
  13. 13. COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
  14. 14. COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
  15. 15. ToptoBottom:TechnologyStack End-to-End: Client – Server / Service When and Where to Measure & Collect?
  16. 16. ToptoBottom:TechnologyStack End-to-End: Client – Server / Service When and Where to Measure & Collect?
  17. 17. When$to$Measure?$Throughout$Applica6on$Lifecycle$ test$ Con6nuous$Integra6on$ code$ build$plan$ Agile$Development$ Source$hp://www.collab.net$ deploy$ operate$ DevOps$ release$ Con6nuous$Delivery$
  18. 18. When$to$Measure?$Throughout$Applica6on$Lifecycle$ test$code$ build$plan$ deploy$ operate$ Commits$ Lines$changed$ Modules$changed$ Issues$resolved$ Features$implemented$ release$
  19. 19. When$to$Measure?$Throughout$Applica6on$Lifecycle$ test$code$ build$plan$ deploy$ operate$ Successful$builds$ Failed$builds$ Build$dura6on$vs!HW!resources!used! Images$(AMI)$built$ release$
  20. 20. When$to$Measure?$Throughout$Applica6on$Lifecycle$ test$code$ build$plan$ deploy$ operate$ Integra6on$test$success/failure$ Performance$test$metrics$ $Throughput$as$a$func=on!of!virtual!HW!used! Stability$test$metrics$ $Memory$leak?$Filesystem$trends$–$fill/cleanup$etc?$ $Degrada6on$of$any$KPI$over!=me?! Security$test$metrics$–$PEN…$ release$
  21. 21. When$to$Measure?$Throughout$Applica6on$Lifecycle$ test$code$ build$plan$ deploy$ operate$ #$of$releases$ #$of$deploys$ #$of$rollbacks$ Opera6onal$KPIs$ $Stability,$availability$ $Performance,$security$ $…$ release$
  22. 22. When$to$Measure?$Throughout$Applica6on$Lifecycle$ test$code$ build$plan$ deploy$ operate$ #$of$bugs$reported++$ #$of$features$requested++$ Performance$&$Cost$op6miza6on$ A/B$test$results$ release$ Feedback$Loop$
  23. 23. Challenge:$DevOps$&$Cloud$Increase$Rate$of$Change$ Rare Releases – Static Servers “Waterfall” Frequent Releases – Dynamic Instances “Agile, Lean, DevOps” Time! Change! Time! Change! New$code,$on$bursts$of$new$instances$ Instance$role$changes$ Dynamic,$recycled$IP$addresses$ LongAlived$servers$ Sta6c$roles$ Sta6c$IP$addresses$
  24. 24. ToptoBottom:TechnologyStack End-to-End: Client – Server / Service When and Where to Measure and Collect?
  25. 25. Where$to$Measure?$EndAtoAEnd$ Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$ AWS$Services$
  26. 26. Where$to$Measure?$EndAtoAEnd$ Test$Client$Agents$ QoS,$QoE$KPIs$ Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$ AWS$Services$
  27. 27. Where$to$Measure?$EndAtoAEnd$ Tcpdump$on$Client$and$App$Servers$ Wireshark$for$Transport$QoS$KPIs$ Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$ AWS$Services$
  28. 28. Client/Server$QoS$with$Transport$Layer$Metrics$ Client$ Server$
  29. 29. Where$to$Measure?$EndAtoAEnd$ AWS$Service$Health$Dashboard$ AWS$CloudTrail$ AWS$CloudWatch$ Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$ AWS$Services$
  30. 30. Monitoring$AWS$A$Service$Health$Dashboard$
  31. 31. Monitoring$AWS$Account$Ac6vi6es$A$AWS$CloudTrail$ You are making API calls... On a growing set of services around the world… CloudTrail is continuously recording API calls… And delivering log files to you
  32. 32. Partner CloudTrail Solutions
  33. 33. Monitoring$AWS$Resources$–$Amazon$CloudWatch$
  34. 34. AWS$Service$Measurements$ •  Auto$Scaling$groups$ •  AWS$es6mated$charges$ •  Amazon$DynamoDB$tables$ •  Amazon$EBS$volumes$ •  Amazon$EC2$instances$ •  Amazon$Elas6Cache$caches$ •  Elas6c$Load$Balancing$ •  Amazon$Elas6c$MapReduce$jobs$ •  Amazon$RDS$databases$ •  Amazon$SNS$no6fica6ons$ •  Amazon$SQS$queues$ •  AWS$Storage$Gateway$ $$$$$++$
  35. 35. CloudWatch+Alarms+
  36. 36. EC2:$$Tell$me$if$my$instance$needs$aen6on$ $ $ DynamoDB:$$Help$me$balance$cost$and$performance$ $ $ Billing:$$Tell$me$when$my$bill$is$gemng$too$high$ $
  37. 37. Custom+Metrics+ Example$–$Instance$Memory$
  38. 38. Where$to$Measure?$EndAtoAEnd$ Request/Response$success/fail$ Response$latency$ Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$ AWS$Services$
  39. 39. Measuring$External,$Dependent$Services$
  40. 40. Where$to$Measure?$EndAtoAEnd$ Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$ AWS$Services$
  41. 41. ToptoBottom:TechnologyStack End-to-End: Client – Server / Service When and Where to Measure and Collect?
  42. 42. User$Applica6on$ Applica6on$Server$ Web$/$DB$Server$ Language$Interpreter$/$$JVM$ Guest$Opera6ng$System$&$Services$ EC2$Instance$ Measure$the$En6re$Stack,$Top$to$Boom$
  43. 43. Applica6on$Internal$Metrics$
  44. 44. COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
  45. 45. $$$STORE$$$|$$ANALYZE$ Glacier$ S3$ EC2$ Redshir$DynamoDB$$ EMR$ Data$Pipeline$ Leverage$AWS$Big$Data$Services$ Kinesis$
  46. 46. COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
  47. 47. METRICS+@ETSY+
  48. 48. Values$over$Time$$ at$Sampling!Rate! Visualiza6on$A$Graph$
  49. 49. Sampling+Rate+ How$oCen$should$I$measure?$ Depends$on$what$you$measure$A$ Depends$on$its$rate!of!change!(frequency)$
  50. 50. Nyquist$$ Frequency$$ Original$signal$=$Red$ Measured$signal$=$Blue$ You!should!measure!at!least!twice!as!oCen!as!your!value!changes!
  51. 51. System$Measurements$==$Signal$ We$can$do$Digital$Signal$Processing$ Linear+Regression$–$trendline$predicts$filesystem$running$out$of$inodes$(cannot$create$files)$
  52. 52. System$Measurements$==$Signal$ We$can$do$Digital$Signal$Processing$ Linear+regression+&+ Fast+Fourier+TransformaAon+ for$paerns,$anomalies$ and$future$predic6ons$
  53. 53. Visualiza6on$–$Scaer$Plot$
  54. 54. Visualiza6on$–$Box$Plot$
  55. 55. Including$outliers$&$ends$of$distribu6on$ Visualiza6on$–$Normal$Curve$&$Histogram$ opsly.com$
  56. 56. COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
  57. 57. Manual$/$Human$Ac6ons$A$OODA$Loop$
  58. 58. Automated$Human$Ac6ons$$ Amazon$CloudWatch,$Amazon$SNS$&$Pager$Duty$
  59. 59. Automatic resizing of compute clusters based on measurements, thresholds and actions Trigger$autoAscaling$policy$ Feature+ Details+ Control+ Define$minimum$and$maximum$instance$pool$ sizes$and$when$scaling$and$cool$down$occurs.$ Integrated+to+Amazon+ CloudWatch+ Use$metrics$gathered$by$CloudWatch$to$drive$ scaling.$ Instance+types+ Run$Auto$Scaling$for$OnADemand$and$Spot$ Instances.$Compa6ble$with$VPC.$ as-create-auto-scaling-group MyGroup --launch-configuration MyConfig --availability-zones us-east-1a --min-size 4 --max-size 200 Amazon$ CloudWatch$ Automated$Ac6ons$–$AWS$Auto$Scaling$
  60. 60. Automated$Ac6ons$A$PID$Controller$ System$Reaches$Target$State$with$Calculated$Changes$and$Monitoring$Feedback$Loop$ Propor6onal,$$ Integral,$$ Deriva6ve$
  61. 61. Useful+Tools+and+Services+
  62. 62. Thank$you$ Markku$Lepistö$A$Technology$Evangelist$ @markkulepisto$
  63. 63. Your$Feedback$is$Important$ Please$complete$the$Survey!$ What’s!good,!what’s!not! What!you!want!to!see!at!these!events! What!you!want!AWS!to!deliver!for!you! $
  64. 64. Q&A

×