Monitorama: How monitoring can improve the rest of the company
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,750
On Slideshare
2,312
From Embeds
438
Number of Embeds
7

Actions

Shares
Downloads
19
Comments
0
Likes
4

Embeds 438

http://jweinstein.net 355
https://twitter.com 73
http://www.tumblr.com 6
http://moderation.local 1
http://feedly.com 1
https://keiseruniversity.blackboard.com 1
http://www.google.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How  monitoring  can  improve   the  rest  of  the  company       Monitorama  EU  2013   @jeff_weinstein  
  • 2. I real-time and batch data analytics
  • 3. Monitoring  can  wildly  improve     the  whole  company  by   sharing  data     and  sharing  techniques.  
  • 4. Monitoring  Folks   Developers   Business     Analysts   ExecuIves   &  Product   Data     ScienIsts   Data  
  • 5. Apps  &   Services  &   Systems   Users   Data   Code  &   Config   Monitoring  
  • 6. Some  problems…  
  • 7. Data  Processing   Apps   Systems   Logs  /   Events   Metrics   Graphs   &  Alerts   Apps   3rd  Party   Reports  &   Queries   ETL   AnalyIc   Systems   Monitoring:  Streaming   BI:  Batch  
  • 8. Data  Needs   Logs   Metrics   Logs   Metrics   Streaming   Batch   Data   Monitoring   BI  
  • 9. Data  Tools  Stack   Monitoring   •  Ad  hoc   –  sed,  grep,  awk   –  ES,  LogStash,  Splunk,  …   •  Storage   –  Hosts,  Ganglia,  OTSDB   –  Central  syslog  server   •  VisualizaIon/ReporIng   –  Graphite,  RRDTool,  3rd  party   –  Homegrown   •  AlerIng/EscalaIon     –  Nagios,  Sensu,  PagerDuty,  …   Rest  of  company   •  Ad  hoc   –  Excel,  SQL,  Hive   –  MapReduce,  …   •  Storage   –  Lots  o’  databases,  Excel   –  Hadoop,  RDBMS…   •  VisualizaIon/ReporIng   –  Excel,  R,  Tableau  ...   –  Dinosaur  apps,  …   •  AlerIng/EscalaIon     –  nada  
  • 10. Metrics  
  • 11. Views   Unintelligible  generated  views  Too  granular  for  long  term  trends   Lack  of  historical   Intolerant  to  anomalies  
  • 12. Team  and  incenIves   •  What  team?   •  Change  vs.  reliability   •  Planning   •  Budget   •  Churn  
  • 13. Good  or  bad?   •  Specific  Tools   •  Decentralized   •  Focus   •  Ownership   •  Lost  context   •  Siloed  work   •  Data  dark   •  Misunderstanding  
  • 14. Some  fixes  
  • 15. End  to  End  Data  Pipeline   ü Structured  logs   ü (Config)   ü Measure  once   ü AutomaIc  metrics   ü API   ü Graph  tools   ü Glossary   ü AnnotaIons  and  tags   ü Pipeline  
  • 16. Structured  events   •  JSON  (or  whatever)   •  (opIonal)  config   •  Tags  per  key   – Type   – Tag:  latency,  funnel,…   – DescripIon   – Storage  
  • 17. Auto:  Graphs,  Glossary,  &  Storage   •  Graphs  and  dashboards   •  *  templates   •  Views  and  stats   •  Glossary   •  Batch  analyIcs   •  Long  term  storage  
  • 18. build   learn   communicate   inspire  
  • 19. Developers   •  Logging  toolkit   •  Data  pipeline   •  Pain  points   •  Outage  causes   •  Deployment  pracIces   •  EscalaIon  playbook   •  Measurement  as  TDD   •  Monitor  staging  env  
  • 20. Business  Analysts   •  Structured  logs     •  Config  for  ETL   •  Metrics  definiIons     •  Slices  and  visualizaIons   •  Data  size  and  cardinality   •  Outages  and  delays   •  Flexibility   •  VisualizaIon  and  tools  
  • 21. Data  ScienIsts   •  Access  to  (meta)data   •  Query  monitoring   •  StaIsIcs  and  models   •  New  data  streams   •  Context  of  data  issues   •  What’s  in  the  logs   •  Validate  algorithms   •  Teach  stats  and  models!  
  • 22. Product  &  ExecuIves   •  Curated  dashboards   •  Graph/alert  tools   •  Learn  the  business   •  PrioriIze  alerts  by  $   •  Incident  post  mortems     •  Metrics  granularity   •  Data  driven  decisions   •  Recognize  and  celebrate  
  • 23. Monitoring  can  become  the  data   plahorm  and  improve  all  teams   with  its  techniques.  
  • 24. Icons  from  The  Noun  Project:  Dmitry  Baranovskiy,  Benjamin  Orlovski,  Luis  Prado,  MikaDo  Nguyen,  Yarden  Gilboa,  Javier  Cabezas,  Icons  Pusher,  Jeremy  Bristol,  Blake  Thomas,  RiIka  Khasgiwale,   Mayene  de  Leon,  Yorlmar  Campos,  Sergey  Shmid   @jeff_weinstein   Thanks!  hiring  ;)