• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Monitorama: How monitoring can improve the rest of the company
 

Monitorama: How monitoring can improve the rest of the company

on

  • 2,387 views

 

Statistics

Views

Total Views
2,387
Views on SlideShare
2,174
Embed Views
213

Actions

Likes
4
Downloads
17
Comments
0

5 Embeds 213

http://jweinstein.net 139
https://twitter.com 66
http://www.tumblr.com 6
http://moderation.local 1
http://feedly.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Monitorama: How monitoring can improve the rest of the company Monitorama: How monitoring can improve the rest of the company Presentation Transcript

    • How  monitoring  can  improve   the  rest  of  the  company       Monitorama  EU  2013   @jeff_weinstein  
    • I real-time and batch data analytics
    • Monitoring  can  wildly  improve     the  whole  company  by   sharing  data     and  sharing  techniques.  
    • Monitoring  Folks   Developers   Business     Analysts   ExecuIves   &  Product   Data     ScienIsts   Data  
    • Apps  &   Services  &   Systems   Users   Data   Code  &   Config   Monitoring  
    • Some  problems…  
    • Data  Processing   Apps   Systems   Logs  /   Events   Metrics   Graphs   &  Alerts   Apps   3rd  Party   Reports  &   Queries   ETL   AnalyIc   Systems   Monitoring:  Streaming   BI:  Batch  
    • Data  Needs   Logs   Metrics   Logs   Metrics   Streaming   Batch   Data   Monitoring   BI  
    • Data  Tools  Stack   Monitoring   •  Ad  hoc   –  sed,  grep,  awk   –  ES,  LogStash,  Splunk,  …   •  Storage   –  Hosts,  Ganglia,  OTSDB   –  Central  syslog  server   •  VisualizaIon/ReporIng   –  Graphite,  RRDTool,  3rd  party   –  Homegrown   •  AlerIng/EscalaIon     –  Nagios,  Sensu,  PagerDuty,  …   Rest  of  company   •  Ad  hoc   –  Excel,  SQL,  Hive   –  MapReduce,  …   •  Storage   –  Lots  o’  databases,  Excel   –  Hadoop,  RDBMS…   •  VisualizaIon/ReporIng   –  Excel,  R,  Tableau  ...   –  Dinosaur  apps,  …   •  AlerIng/EscalaIon     –  nada  
    • Metrics  
    • Views   Unintelligible  generated  views  Too  granular  for  long  term  trends   Lack  of  historical   Intolerant  to  anomalies  
    • Team  and  incenIves   •  What  team?   •  Change  vs.  reliability   •  Planning   •  Budget   •  Churn  
    • Good  or  bad?   •  Specific  Tools   •  Decentralized   •  Focus   •  Ownership   •  Lost  context   •  Siloed  work   •  Data  dark   •  Misunderstanding  
    • Some  fixes  
    • End  to  End  Data  Pipeline   ü Structured  logs   ü (Config)   ü Measure  once   ü AutomaIc  metrics   ü API   ü Graph  tools   ü Glossary   ü AnnotaIons  and  tags   ü Pipeline  
    • Structured  events   •  JSON  (or  whatever)   •  (opIonal)  config   •  Tags  per  key   – Type   – Tag:  latency,  funnel,…   – DescripIon   – Storage  
    • Auto:  Graphs,  Glossary,  &  Storage   •  Graphs  and  dashboards   •  *  templates   •  Views  and  stats   •  Glossary   •  Batch  analyIcs   •  Long  term  storage  
    • build   learn   communicate   inspire  
    • Developers   •  Logging  toolkit   •  Data  pipeline   •  Pain  points   •  Outage  causes   •  Deployment  pracIces   •  EscalaIon  playbook   •  Measurement  as  TDD   •  Monitor  staging  env  
    • Business  Analysts   •  Structured  logs     •  Config  for  ETL   •  Metrics  definiIons     •  Slices  and  visualizaIons   •  Data  size  and  cardinality   •  Outages  and  delays   •  Flexibility   •  VisualizaIon  and  tools  
    • Data  ScienIsts   •  Access  to  (meta)data   •  Query  monitoring   •  StaIsIcs  and  models   •  New  data  streams   •  Context  of  data  issues   •  What’s  in  the  logs   •  Validate  algorithms   •  Teach  stats  and  models!  
    • Product  &  ExecuIves   •  Curated  dashboards   •  Graph/alert  tools   •  Learn  the  business   •  PrioriIze  alerts  by  $   •  Incident  post  mortems     •  Metrics  granularity   •  Data  driven  decisions   •  Recognize  and  celebrate  
    • Monitoring  can  become  the  data   plahorm  and  improve  all  teams   with  its  techniques.  
    • Icons  from  The  Noun  Project:  Dmitry  Baranovskiy,  Benjamin  Orlovski,  Luis  Prado,  MikaDo  Nguyen,  Yarden  Gilboa,  Javier  Cabezas,  Icons  Pusher,  Jeremy  Bristol,  Blake  Thomas,  RiIka  Khasgiwale,   Mayene  de  Leon,  Yorlmar  Campos,  Sergey  Shmid   @jeff_weinstein   Thanks!  hiring  ;)