How	
  monitoring	
  can	
  improve	
  
the	
  rest	
  of	
  the	
  company	
  
	
  
	
  
Monitorama	
  EU	
  2013	
  
@je...
I
real-time 
and batch 
data analytics
Monitoring	
  can	
  wildly	
  improve	
  	
  
the	
  whole	
  company	
  by	
  
sharing	
  data	
  	
  
and	
  sharing	
 ...
Monitoring	
  Folks	
  
Developers	
  
Business	
  	
  
Analysts	
  
ExecuIves	
  
&	
  Product	
  
Data	
  	
  
ScienIsts...
Apps	
  &	
  
Services	
  &	
  
Systems	
  
Users	
  
Data	
  
Code	
  &	
  
Config	
  
Monitoring	
  
Some	
  problems…	
  
Data	
  Processing	
  
Apps	
  
Systems	
  
Logs	
  /	
  
Events	
  
Metrics	
  
Graphs	
  
&	
  Alerts	
  
Apps	
  
3rd	
...
Data	
  Needs	
  
Logs	
   Metrics	
   Logs	
   Metrics	
  
Streaming	
   Batch	
  
Data	
  
Monitoring	
  
BI	
  
Data	
  Tools	
  Stack	
  
Monitoring	
  
•  Ad	
  hoc	
  
–  sed,	
  grep,	
  awk	
  
–  ES,	
  LogStash,	
  Splunk,	
  …...
Metrics	
  
Views	
  
Unintelligible	
  generated	
  views	
  Too	
  granular	
  for	
  long	
  term	
  trends	
  
Lack	
  of	
  histo...
Team	
  and	
  incenIves	
  
•  What	
  team?	
  
•  Change	
  vs.	
  reliability	
  
•  Planning	
  
•  Budget	
  
•  Chu...
Good	
  or	
  bad?	
  
•  Specific	
  Tools	
  
•  Decentralized	
  
•  Focus	
  
•  Ownership	
  
•  Lost	
  context	
  
•...
Some	
  fixes	
  
End	
  to	
  End	
  Data	
  Pipeline	
  
ü Structured	
  logs	
  
ü (Config)	
  
ü Measure	
  once	
  
ü AutomaIc	
  me...
Structured	
  events	
  
•  JSON	
  (or	
  whatever)	
  
•  (opIonal)	
  config	
  
•  Tags	
  per	
  key	
  
– Type	
  
– ...
Auto:	
  Graphs,	
  Glossary,	
  &	
  Storage	
  
•  Graphs	
  and	
  dashboards	
  
•  *	
  templates	
  
•  Views	
  and...
build	
  
learn	
  
communicate	
  
inspire	
  
Developers	
  
•  Logging	
  toolkit	
  
•  Data	
  pipeline	
  
•  Pain	
  points	
  
•  Outage	
  causes	
  
•  Deployme...
Business	
  Analysts	
  
•  Structured	
  logs	
  	
  
•  Config	
  for	
  ETL	
  
•  Metrics	
  definiIons	
  	
  
•  Slice...
Data	
  ScienIsts	
  
•  Access	
  to	
  (meta)data	
  
•  Query	
  monitoring	
  
•  StaIsIcs	
  and	
  models	
  
•  New...
Product	
  &	
  ExecuIves	
  
•  Curated	
  dashboards	
  
•  Graph/alert	
  tools	
  
•  Learn	
  the	
  business	
  
•  ...
Monitoring	
  can	
  become	
  the	
  data	
  
plahorm	
  and	
  improve	
  all	
  teams	
  
with	
  its	
  techniques.	
  
Icons	
  from	
  The	
  Noun	
  Project:	
  Dmitry	
  Baranovskiy,	
  Benjamin	
  Orlovski,	
  Luis	
  Prado,	
  MikaDo	
 ...
Upcoming SlideShare
Loading in...5
×

Monitorama: How monitoring can improve the rest of the company

3,228

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,228
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
20
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Monitorama: How monitoring can improve the rest of the company"

  1. 1. How  monitoring  can  improve   the  rest  of  the  company       Monitorama  EU  2013   @jeff_weinstein  
  2. 2. I real-time and batch data analytics
  3. 3. Monitoring  can  wildly  improve     the  whole  company  by   sharing  data     and  sharing  techniques.  
  4. 4. Monitoring  Folks   Developers   Business     Analysts   ExecuIves   &  Product   Data     ScienIsts   Data  
  5. 5. Apps  &   Services  &   Systems   Users   Data   Code  &   Config   Monitoring  
  6. 6. Some  problems…  
  7. 7. Data  Processing   Apps   Systems   Logs  /   Events   Metrics   Graphs   &  Alerts   Apps   3rd  Party   Reports  &   Queries   ETL   AnalyIc   Systems   Monitoring:  Streaming   BI:  Batch  
  8. 8. Data  Needs   Logs   Metrics   Logs   Metrics   Streaming   Batch   Data   Monitoring   BI  
  9. 9. Data  Tools  Stack   Monitoring   •  Ad  hoc   –  sed,  grep,  awk   –  ES,  LogStash,  Splunk,  …   •  Storage   –  Hosts,  Ganglia,  OTSDB   –  Central  syslog  server   •  VisualizaIon/ReporIng   –  Graphite,  RRDTool,  3rd  party   –  Homegrown   •  AlerIng/EscalaIon     –  Nagios,  Sensu,  PagerDuty,  …   Rest  of  company   •  Ad  hoc   –  Excel,  SQL,  Hive   –  MapReduce,  …   •  Storage   –  Lots  o’  databases,  Excel   –  Hadoop,  RDBMS…   •  VisualizaIon/ReporIng   –  Excel,  R,  Tableau  ...   –  Dinosaur  apps,  …   •  AlerIng/EscalaIon     –  nada  
  10. 10. Metrics  
  11. 11. Views   Unintelligible  generated  views  Too  granular  for  long  term  trends   Lack  of  historical   Intolerant  to  anomalies  
  12. 12. Team  and  incenIves   •  What  team?   •  Change  vs.  reliability   •  Planning   •  Budget   •  Churn  
  13. 13. Good  or  bad?   •  Specific  Tools   •  Decentralized   •  Focus   •  Ownership   •  Lost  context   •  Siloed  work   •  Data  dark   •  Misunderstanding  
  14. 14. Some  fixes  
  15. 15. End  to  End  Data  Pipeline   ü Structured  logs   ü (Config)   ü Measure  once   ü AutomaIc  metrics   ü API   ü Graph  tools   ü Glossary   ü AnnotaIons  and  tags   ü Pipeline  
  16. 16. Structured  events   •  JSON  (or  whatever)   •  (opIonal)  config   •  Tags  per  key   – Type   – Tag:  latency,  funnel,…   – DescripIon   – Storage  
  17. 17. Auto:  Graphs,  Glossary,  &  Storage   •  Graphs  and  dashboards   •  *  templates   •  Views  and  stats   •  Glossary   •  Batch  analyIcs   •  Long  term  storage  
  18. 18. build   learn   communicate   inspire  
  19. 19. Developers   •  Logging  toolkit   •  Data  pipeline   •  Pain  points   •  Outage  causes   •  Deployment  pracIces   •  EscalaIon  playbook   •  Measurement  as  TDD   •  Monitor  staging  env  
  20. 20. Business  Analysts   •  Structured  logs     •  Config  for  ETL   •  Metrics  definiIons     •  Slices  and  visualizaIons   •  Data  size  and  cardinality   •  Outages  and  delays   •  Flexibility   •  VisualizaIon  and  tools  
  21. 21. Data  ScienIsts   •  Access  to  (meta)data   •  Query  monitoring   •  StaIsIcs  and  models   •  New  data  streams   •  Context  of  data  issues   •  What’s  in  the  logs   •  Validate  algorithms   •  Teach  stats  and  models!  
  22. 22. Product  &  ExecuIves   •  Curated  dashboards   •  Graph/alert  tools   •  Learn  the  business   •  PrioriIze  alerts  by  $   •  Incident  post  mortems     •  Metrics  granularity   •  Data  driven  decisions   •  Recognize  and  celebrate  
  23. 23. Monitoring  can  become  the  data   plahorm  and  improve  all  teams   with  its  techniques.  
  24. 24. Icons  from  The  Noun  Project:  Dmitry  Baranovskiy,  Benjamin  Orlovski,  Luis  Prado,  MikaDo  Nguyen,  Yarden  Gilboa,  Javier  Cabezas,  Icons  Pusher,  Jeremy  Bristol,  Blake  Thomas,  RiIka  Khasgiwale,   Mayene  de  Leon,  Yorlmar  Campos,  Sergey  Shmid   @jeff_weinstein   Thanks!  hiring  ;)  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×