SlideShare a Scribd company logo
1 of 50
Download to read offline
Monitoring is easy;
                     why do we suck at it?


                            /   monitoring it all
Tuesday, November 8, 2011
Who is this guy?                                                        @postwait

                Author of “Scalable Internet Architectures”
                Pearson, ISBN: 067232699X

                Contributor to “Web Operations”
                O’Reilly, ISBN: 978-1-4493-7744-1



                Founder of OmniTI, Message Systems, Fontdeck, & Circonus
                I like to tackle problems that are “always on” and “always growing.”




                I am an Engineer
                A practitioner of academic computing.
                IEEE member and Senior ACM member.
                On the Editorial Board of ACM’s Queue magazine.



Tuesday, November 8, 2011
Monitoring: let’s start with a definition.




                       •    analytics

                       •    trending

                       •    fault-detection / alerting

                       •    capacity planning


                       •    it is the collection and use of telemetry data




Tuesday, November 8, 2011
What monitoring is not




                       •    controls


                       •    via a monitoring you observe,
                            you do not influence




Tuesday, November 8, 2011
So why do we suck at it?



                            tl;dr
                            because we think about

                             •   networks,

                             •   systems, and

                             •   applications
                            instead of what matters: business.




Tuesday, November 8, 2011
Your purpose




                       •    Your purpose is to make
                            your company’s web business
                            operate.

                            (hence: “web operations”)




Tuesday, November 8, 2011
Your purpose




                       •    Your purpose is to make
                            your company’s web business
                            operate.

                            (hence: “web operations”)




Tuesday, November 8, 2011
Your purpose




                       •    ensure business success




Tuesday, November 8, 2011
Understanding your purpose




                       •    who defines business success?

                            •   shareholders, ultimately

                            •   the board of directors, in their stead

                            •   the CEO on an operational, day-to-day basis




Tuesday, November 8, 2011
Understanding your purpose




                       •    Assuming your CEO is doing a good job

                            •   the executive team understands these metrics


                       •    Assuming the executive team is competent

                            •   their reports understand these metrics
                                (at least the pertinent ones)




Tuesday, November 8, 2011
Pertinent == Problematic




                       •    You enable all aspects of the business

                       •    All these metrics are pertinent




Tuesday, November 8, 2011
But why?




                       •    You could simply track stuff that is in your purview.

                       •    Why not?




Tuesday, November 8, 2011
Technology



                       •    As a technology operations group,
                            you have the technology.




                                           We can rebuild him.
                                           We have the technology.
                                           We can make him better than he was.
                                           Better...stronger...faster.
                                                                    - Oscar Goldman

Tuesday, November 8, 2011
Why is our technology better?




                       •    Simply put: MTTD




Tuesday, November 8, 2011
Now, what about your purview?




                       •    Obviously monitoring the business is useful.

                       •    However, you cannot directly affect business.

                       •    You indirectly affect it by operating the web portion.




Tuesday, November 8, 2011
What can you change?



                       •    You can control:

                            •   releases,

                            •   performance,

                            •   stability,

                            •   computing resources,

                            •   networking,

                            •   and availability.



Tuesday, November 8, 2011
Visualize!




                       •    All this information must be presented visually.




Tuesday, November 8, 2011
Text.




                       •    Text is incredibly useful.

                       •    Consider: deployment.




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394            15:03:14 2011/06/15
                              previous deploy      1h 42m 18s ago
                                                  11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Code Deployment




                            r82394 (by corey)    1h 7m 9s    ago
                              previous deploy    1h 42m 18s ago
                                                11 deploys today




Tuesday, November 8, 2011
Text.




                       •    Numbers are trickier.

                       •    So many representations from which to choose.




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Beware




Tuesday, November 8, 2011
Gauges require understanding




                       •    Gauges imply a deep understanding of

                            •   bounds, and

                            •   tolerances




Tuesday, November 8, 2011
Gauges require understanding




                       •    General advice

                            •   If the range will ever change, don’t use gauges




Tuesday, November 8, 2011
Gauges require understanding




                       •    Great for:

                            •   percentages,

                            •   temperature,

                            •   power per rack,

                            •   bandwidth per uplink




Tuesday, November 8, 2011
Gauges require understanding




                       •    Bad for:

                            •   IOPS,

                            •   current visitor counts,

                            •   requests per second,

                            •   bandwidth overall




Tuesday, November 8, 2011
Graphs are often better




Tuesday, November 8, 2011
Even little ones




Tuesday, November 8, 2011
Think relatively




Tuesday, November 8, 2011
Think relatively




                            xxxxxxxxxxxxxxx


                            xxxxxxxxxxxxxxx




Tuesday, November 8, 2011
Users live all around the world




                       •    Users live just about everywhere

                       •    “Where?” is a useful question




Tuesday, November 8, 2011
Geolocation




Tuesday, November 8, 2011
Geolocation is interesting




                       •    to marketing

                       •    to legal

                       •    (okay to everyone)


                       •    but, not so useful to operations




Tuesday, November 8, 2011
Geolocation is interesting




                       •    perhaps more interesting




Tuesday, November 8, 2011
Geolocation is interesting




Tuesday, November 8, 2011
Geolocation




                       •    Internet location != geo-political location




Tuesday, November 8, 2011
ASN location


                       •    The closest thing to geo-political boundaries is peering



       -bash-4.0$ /usr/sbin/bgpctl show rib 66.78.236.243
       flags: * = Valid, > = Selected, I = via IBGP, A = Announced
       origin: i = IGP, e = EGP, ? = Incomplete

       flags destination                  gateway         lpref   med aspath origin
             66.78.236.0/22               64.202.119.7      100     0 23352 4436 2914 3356 32778 i

       ### ASN 327778 is “Smart City Networks, L.P.”




Tuesday, November 8, 2011
ASN location




Tuesday, November 8, 2011
What about the business?




Tuesday, November 8, 2011
What about the business?




                            Authorizations : Hard Failed : Soft Failed : Releases


Tuesday, November 8, 2011
Is that all?




                       •    Hells no.




Tuesday, November 8, 2011
It’s all about real-time




                       •    Everything so far is old hat (maybe)

                       •    Every business unit has visualizations like this


                       •    You need to combine the data

                       •    You need to make it real-time




Tuesday, November 8, 2011
Thanks




                       •    web demo ensues....




Tuesday, November 8, 2011

More Related Content

Viewers also liked

Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everythingTheo Schlossnagle
 
Velocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesVelocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesTheo Schlossnagle
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaTheo Schlossnagle
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet ArchitectureTheo Schlossnagle
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About PerformanceTheo Schlossnagle
 
Wireless telemetry systems
Wireless telemetry systemsWireless telemetry systems
Wireless telemetry systemsSneha Suluru
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Telemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetryTelemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetrysagheer ahmed
 

Viewers also liked (20)

Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
 
PostgreSQL on Solaris
PostgreSQL on SolarisPostgreSQL on Solaris
PostgreSQL on Solaris
 
Velocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesVelocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet Architectures
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Atldevops
AtldevopsAtldevops
Atldevops
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet Architecture
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
Telrmetry1
Telrmetry1Telrmetry1
Telrmetry1
 
Wireless telemetry systems
Wireless telemetry systemsWireless telemetry systems
Wireless telemetry systems
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Telemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetryTelemetry types, frequency,position and multiplexing in telemetry
Telemetry types, frequency,position and multiplexing in telemetry
 

Similar to Monitoring is easy, why are we so bad at it presentation

Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Leonardo Borges
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud SecurityJason Chan
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian
 
JavaSE - The road forward
JavaSE - The road forwardJavaSE - The road forward
JavaSE - The road forwardeug3n_cojocaru
 
LISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps TransformationLISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps Transformationbenrockwood
 
SplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunk
 
Puppet camp europe 2011 hackability
Puppet camp europe 2011   hackabilityPuppet camp europe 2011   hackability
Puppet camp europe 2011 hackabilityPuppet
 
Software on the High Seas
Software on the High SeasSoftware on the High Seas
Software on the High SeasSoren Harner
 
Migration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshMigration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshlucenerevolution
 
Devopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the UnionDevopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the UnionJohn Willis
 
A Look at the Future of HTML5
A Look at the Future of HTML5A Look at the Future of HTML5
A Look at the Future of HTML5Tim Wright
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycaneYusuke Ando
 
Devops workshop unit2
Devops workshop unit2Devops workshop unit2
Devops workshop unit2John Willis
 
Community Code: Xero
Community Code: XeroCommunity Code: Xero
Community Code: XeroSencha
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Michael McIntosh
 
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02TNR Global
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011Stefane Fermigier
 
Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0Sencha
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling DisqusPyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disquszeeg
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birdscolinbdclark
 

Similar to Monitoring is easy, why are we so bad at it presentation (20)

Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 
JavaSE - The road forward
JavaSE - The road forwardJavaSE - The road forward
JavaSE - The road forward
 
LISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps TransformationLISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps Transformation
 
SplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrack
 
Puppet camp europe 2011 hackability
Puppet camp europe 2011   hackabilityPuppet camp europe 2011   hackability
Puppet camp europe 2011 hackability
 
Software on the High Seas
Software on the High SeasSoftware on the High Seas
Software on the High Seas
 
Migration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshMigration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntosh
 
Devopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the UnionDevopsdays Goteborg 2011 - State of the Union
Devopsdays Goteborg 2011 - State of the Union
 
A Look at the Future of HTML5
A Look at the Future of HTML5A Look at the Future of HTML5
A Look at the Future of HTML5
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycane
 
Devops workshop unit2
Devops workshop unit2Devops workshop unit2
Devops workshop unit2
 
Community Code: Xero
Community Code: XeroCommunity Code: Xero
Community Code: Xero
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
 
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011
 
Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0Performance Optimization for Ext GWT 3.0
Performance Optimization for Ext GWT 3.0
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling DisqusPyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birds
 

More from Theo Schlossnagle

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to ComplexityTheo Schlossnagle
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or NotTheo Schlossnagle
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service designTheo Schlossnagle
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoringTheo Schlossnagle
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachTheo Schlossnagle
 

More from Theo Schlossnagle (12)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approach
 
Http front-ends
Http front-endsHttp front-ends
Http front-ends
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Monitoring is easy, why are we so bad at it presentation

  • 1. Monitoring is easy; why do we suck at it? / monitoring it all Tuesday, November 8, 2011
  • 2. Who is this guy? @postwait Author of “Scalable Internet Architectures” Pearson, ISBN: 067232699X Contributor to “Web Operations” O’Reilly, ISBN: 978-1-4493-7744-1 Founder of OmniTI, Message Systems, Fontdeck, & Circonus I like to tackle problems that are “always on” and “always growing.” I am an Engineer A practitioner of academic computing. IEEE member and Senior ACM member. On the Editorial Board of ACM’s Queue magazine. Tuesday, November 8, 2011
  • 3. Monitoring: let’s start with a definition. • analytics • trending • fault-detection / alerting • capacity planning • it is the collection and use of telemetry data Tuesday, November 8, 2011
  • 4. What monitoring is not • controls • via a monitoring you observe, you do not influence Tuesday, November 8, 2011
  • 5. So why do we suck at it? tl;dr because we think about • networks, • systems, and • applications instead of what matters: business. Tuesday, November 8, 2011
  • 6. Your purpose • Your purpose is to make your company’s web business operate. (hence: “web operations”) Tuesday, November 8, 2011
  • 7. Your purpose • Your purpose is to make your company’s web business operate. (hence: “web operations”) Tuesday, November 8, 2011
  • 8. Your purpose • ensure business success Tuesday, November 8, 2011
  • 9. Understanding your purpose • who defines business success? • shareholders, ultimately • the board of directors, in their stead • the CEO on an operational, day-to-day basis Tuesday, November 8, 2011
  • 10. Understanding your purpose • Assuming your CEO is doing a good job • the executive team understands these metrics • Assuming the executive team is competent • their reports understand these metrics (at least the pertinent ones) Tuesday, November 8, 2011
  • 11. Pertinent == Problematic • You enable all aspects of the business • All these metrics are pertinent Tuesday, November 8, 2011
  • 12. But why? • You could simply track stuff that is in your purview. • Why not? Tuesday, November 8, 2011
  • 13. Technology • As a technology operations group, you have the technology. We can rebuild him. We have the technology. We can make him better than he was. Better...stronger...faster. - Oscar Goldman Tuesday, November 8, 2011
  • 14. Why is our technology better? • Simply put: MTTD Tuesday, November 8, 2011
  • 15. Now, what about your purview? • Obviously monitoring the business is useful. • However, you cannot directly affect business. • You indirectly affect it by operating the web portion. Tuesday, November 8, 2011
  • 16. What can you change? • You can control: • releases, • performance, • stability, • computing resources, • networking, • and availability. Tuesday, November 8, 2011
  • 17. Visualize! • All this information must be presented visually. Tuesday, November 8, 2011
  • 18. Text. • Text is incredibly useful. • Consider: deployment. Tuesday, November 8, 2011
  • 19. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 20. Code Deployment r82394 15:03:14 2011/06/15 previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 21. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 22. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 23. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 24. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys today Tuesday, November 8, 2011
  • 25. Text. • Numbers are trickier. • So many representations from which to choose. Tuesday, November 8, 2011
  • 30. Gauges require understanding • Gauges imply a deep understanding of • bounds, and • tolerances Tuesday, November 8, 2011
  • 31. Gauges require understanding • General advice • If the range will ever change, don’t use gauges Tuesday, November 8, 2011
  • 32. Gauges require understanding • Great for: • percentages, • temperature, • power per rack, • bandwidth per uplink Tuesday, November 8, 2011
  • 33. Gauges require understanding • Bad for: • IOPS, • current visitor counts, • requests per second, • bandwidth overall Tuesday, November 8, 2011
  • 34. Graphs are often better Tuesday, November 8, 2011
  • 35. Even little ones Tuesday, November 8, 2011
  • 37. Think relatively xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx Tuesday, November 8, 2011
  • 38. Users live all around the world • Users live just about everywhere • “Where?” is a useful question Tuesday, November 8, 2011
  • 40. Geolocation is interesting • to marketing • to legal • (okay to everyone) • but, not so useful to operations Tuesday, November 8, 2011
  • 41. Geolocation is interesting • perhaps more interesting Tuesday, November 8, 2011
  • 43. Geolocation • Internet location != geo-political location Tuesday, November 8, 2011
  • 44. ASN location • The closest thing to geo-political boundaries is peering -bash-4.0$ /usr/sbin/bgpctl show rib 66.78.236.243 flags: * = Valid, > = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin 66.78.236.0/22 64.202.119.7 100 0 23352 4436 2914 3356 32778 i ### ASN 327778 is “Smart City Networks, L.P.” Tuesday, November 8, 2011
  • 46. What about the business? Tuesday, November 8, 2011
  • 47. What about the business? Authorizations : Hard Failed : Soft Failed : Releases Tuesday, November 8, 2011
  • 48. Is that all? • Hells no. Tuesday, November 8, 2011
  • 49. It’s all about real-time • Everything so far is old hat (maybe) • Every business unit has visualizations like this • You need to combine the data • You need to make it real-time Tuesday, November 8, 2011
  • 50. Thanks • web demo ensues.... Tuesday, November 8, 2011