• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools
 

Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools

on

  • 4,899 views

Monitorama opening keynote talk on the challenges of Monitoring in a world where we need to deal with continuous delivery, cloud, and automated control feedback loops.

Monitorama opening keynote talk on the challenges of Monitoring in a world where we need to deal with continuous delivery, cloud, and automated control feedback loops.

Statistics

Views

Total Views
4,899
Views on SlideShare
4,466
Embed Views
433

Actions

Likes
27
Downloads
79
Comments
1

7 Embeds 433

https://sprintr.home.mendix.com 345
https://twitter.com 61
http://www.slideee.com 23
http://www.linkedin.com 1
http://www.unmanageability.com 1
https://www.linkedin.com 1
http://zoomero.no 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • This unique slide share really needs to go viral... if you're finding it difficult to showcase this slideshow, there's a slide share exspert on Fiverr.com/kez1000 that will be able to help you with this. Any way continue the good work!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools Presentation Transcript

    • Please, no More Minutes, Milliseconds, Monoliths... or Monitoring Tools! Adrian Cockcroft @adrianco #Monitorama May 2014
    • 2 | Battery Ventures
    • 3 | Battery Ventures Enterprise IT Adoption of Cloud By Simon Wardley http://enterpriseitadoption.com/ You Are Here
    • 4 | Battery Ventures Why am I at Monitorama?
    • 5 | Battery Ventures Twenty Years of Free and Open Source Monitoring ● 1994 The “SE Toolkit” and virtual_adrian.se ● 1998 Sun Performance Tuning, Java & The Internet Book ● 1999 Resource Management Sun Blueprint Book ● 2000 Capacity Planning for Web Services Sun Blueprint Book ● 2007 A. A. Michelson Award for Outstanding Contribution to Computer Metrics, by the Computer Measurement Group ● 2004-2008 Capacity Planning with Free Tools Workshop at CMG ● 2014 Monitorama!
    • 6 | Battery Ventures State of the Art for Free Tools in 2008 http://www.slideshare.net/adrianco/capacity-planning-with-free-tools
    • 7 | Battery Ventures History Lesson http://sourceforge.net/projects/setoolkit/ SE is a C interpreter with built-in access to all Solaris metric data sources
    • 8 | Battery Ventures Topics for Today Minutes Monoliths Milliseconds Monitoring tools Challenges for monitoring Continuous delivery & microservices Analysis and closed loop control systems Tools for developers who operate code in production Challenges of dynamic, ephemeral, distributed cloud applications
    • 9 | Battery Ventures No more monitoring tools?
    • 10 | Battery Ventures We have too many of them already… What’s needed is more analysis tools.
    • 11 | Battery Ventures #Analysorama?
    • 12 | Battery Ventures Rule #1: Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics.
    • 13 | Battery Ventures What’s wrong with minutes?
    • 14 | Battery Ventures What’s wrong with minutes? Takes too long to see a problem 0 1 2 3 4 5 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Metric Threshold Something broke at 2m20 40s of failure didn’t trigger 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed (maybe) 1st high metric seen on graph Three datapoints on user graph so looks bad at 8m00.
    • 15 | Battery Ventures Whoops! I didn’t mean that! Reverting… Not cool if it takes 5 minutes to see it failed and 5 more to see a fix No-one notices if it only takes 5 seconds to detect and 5 to see a fix
    • 16 | Battery Ventures Try that again by the second More confidence more quickly 0 1 2 3 4 Minute 1 Minute 2 Minute 3 Minute 4 Minute 5 Minute 6 Minute 7 Threshold ThresholdSomething broke at 2m20 Measurable in 1s 1st high metric seen at agent on instance 1st high metric arrives at monitoring system 1st high metric processed 1st high metric seen on graph Three datapoints on user graph so looks bad at 2m25.
    • 17 | Battery Ventures Continuous Delivery and DevOps Implications ●Changes are smaller but more frequent ●Individual changes more likely to be broken ●Changes likely to be deployed by developers ●Instant detection and rollback matters much more
    • 18 | Battery Ventures SaaS Based Products Show What Can Be Done www.vividcortex.com and www.boundary.com Seeing Problems In Seconds
    • 19 | Battery Ventures NetflixOSS Hystrix / Turbine Circuit Breaker Monitoring http://techblog.netflix.com/2012/12/hystrix-dashboard-and-turbine.html Streaming metrics directly from front end services to a web browser
    • 20 | Battery Ventures Rule #2: Metric to display latency needs to be less than human attention span (~10s)
    • 21 | Battery Ventures What’s Wrong With Milliseconds?
    • 22 | Battery Ventures A Millisecond is a Very Long Time! ● Some JVM based tools measure response times in ms Network round trip within a datacenter/zone is less than 1ms SSD access latency is usually less than 1ms Cassandra (a Java app) response times can be less than 1ms ● Rounding Errors Quantization loses too much information Automated threshold warning “One is infinitely larger than zero”! JVM does have nanosecond resolution times available
    • 23 | Battery Ventures Rule #3: Validate that your measurement system has enough accuracy and precision. Gauge Repeatability and Reproducibility matters, see http://en.wikipedia.org/wiki/ANOVA_gauge_R%26R
    • 24 | Battery Ventures Monolithic Monitoring Systems Simple to build and install, but problematic… Services Being Monitored Monolithic Monitoring System Services Being Monitored Distributed Collection Systems Analysis / Display Aggregators
    • 25 | Battery Ventures Monolithic Monitoring Issues ● Scalability Problems scaling data collection, analysis and reporting throughput Limitations on number of distinct metrics that can be collected Traffic storms can overload the system and take it down ● Availability Monitoring system needs to stay up when everything else dies! Downtime for upgrades is always inconvenient Gaps in the metric history can trigger alarms and lose confidence
    • 26 | Battery Ventures In-Band, Out-of-Band, or Both? In-band means deployed using same tools and infrastructure as your services Dependencies lead to common mode failures that can leave you blind Best option is both in-house in-band, and external SaaS Services Monitoring System Monitoring System SaaS Based Monitoring In-Band Monitoring Very unlikely to have both fail at the same time
    • 27 | Battery Ventures Rule #4: Monitoring systems need to be more available and scalable than the systems being monitored.
    • 28 | Battery Ventures Continuous Delivery
    • 29 | Battery Ventures Issues with Continuous Delivery and Microservices ● High rate of change Code pushes can cause floods of new instances and metrics Short baseline for alert threshold analysis – everything looks unusual ● Ephemeral Configurations Short lifetimes make it hard to aggregate historical views Hand tweaked monitoring tools take too much work to keep running ● Microservices with complex calling patterns End-to-end request flow measurements are very important Request flow visualizations get overwhelmed
    • 30 | Battery Ventures Microservice Based Architectures See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture From a Gilt Groupe Presentation
    • 31 | Battery Ventures “Death Star” Architecture Diagrams As visualized by Appdynamics, Boundary.com and Twitter internal tools Netflix Gilt Groupe (12 of 450) Twitter
    • 32 | Battery Ventures Closed Loop Control Systems
    • 33 | Battery Ventures Autoscaled Ephemeral Instances at Netflix (the old way) ● Largest services use autoscaled red/black code pushes ● Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
    • 34 | Battery Ventures Scryer - Predictive Auto-scaling at Netflix See http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html and http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html More morning load Sat/Sun high traffic Lower load on Weds 24 Hours predicted traffic vs. actual FFT based prediction driving AWS Autoscaler to plan minimum capacity
    • 35 | Battery Ventures Netflix Automatic Code Deployment Canary - Bad Signature
    • 36 | Battery Ventures Happy Canary Signature
    • 37 | Battery Ventures Monitoring Tools for Developers ● Most monitoring tools are built to be used by operations people Focus on individual systems rather than applications Focus on utilization rather than throughput and response time Fiefdoms of sysadmin, network admin, storage admin, database admin… Hard to integrate and extend ● Developer oriented monitoring tools Application Performance Measurement (APM) and Analysis Business transactions, response time, JVM internal metrics Logging business metrics directly (NetflixOSS Servo, Yammer Metrics) APIs for integration, data extraction, deep linking and embedding http://techblog.netflix.com/2012/02/announcing-servo.html and http://metrics.codahale.com/
    • 38 | Battery Ventures Challenges of Dynamic, Ephemeral, Distributed Cloud Applications
    • 39 | Battery Ventures Dynamic and Ephemeral Challenges ● Datacenter Assets Arrive infrequently, disappear infrequently Stick around for three years or so before they get retired Have unique IP and Mac addresses ● Cloud Assets Arrive in bursts – a Netflix code push creates over a hundred per minute Stick around for a few hours before they get retired Often re-use the IP and Mac address that was just vacated! Use NetflixOSS Edda to record a full history of your configuration http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html
    • 40 | Battery Ventures Cloud Native Architectures
    • 41 | Battery Ventures Traditional vs. Cloud Native Storage Architectures Business Logic Database Master Fabric Storage Arrays Database Slave Fabric Storage Arrays Business Logic Cassandra Zone A nodes Cassandra Zone B nodes Cassandra Zone C nodes Cloud Object Store Backups
    • 42 | Battery Ventures Distributed Cloud Applications Challenges ● Cloud provider data stores don’t have the usual monitoring hooks e.g. no way to install an agent on AWS RDS MySQL, AWS DynamoDB ● Dependency on web services as well as code on instances Integration of data sources like CloudWatch, measure use of S3 etc. ● Cloud applications span zones and regions Monitoring tools need to span and aggregate zones and regions too! ● NoSQL data stores introduce new protocols and metrics e.g. cross zone and cross regions replication traffic for Cassandra
    • 43 | Battery Ventures Monitoring “New Rules” by @adrianco 1. Spend more time on analysis than data collection and display 2. Reduce key business metric latency to less than 10s 3. Validate your measurement system precision and accuracy 4. Be more available and scalable than the services being monitored 5. Optimize for distributed, ephemeral cloud native applications
    • 44 | Battery Ventures Any Questions? ● Battery Ventures http://www.battery.com ● Adrian’s Blog http://perfcap.blogspot.com ● Slideshare http://slideshare.com/adriancockcroft Appearances by @adrianco ● Migrating to Microservices – Qcon London - March 6th, 2014 ● Monitorama Opening Keynote Portland OR - May 7th, 2014 ● GOTO Chicago Opening Keynote May 20th, 2014 ● DevOps Summit at Cloud Expo New York – June 10th, 2014 ● Qcon New York – June 11th, 2014 ● GOTO Copenhagen/Aarhus – Denmark – Oct 25th, 2014 Find me on LinkedIn or Twitter @adrianco