Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Presentation devoxx4kids à iut-agile
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Devoxx 2014 Monitoring

Download to read offline

Slidedeck of the talk
Devoxx 2014

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Devoxx 2014 Monitoring

  1. 1. Monitoring ! Claude Falguière Valtech Paris #DV14 #Monitoring @cfalguiere
  2. 2. Content • DevOps is more than tooling! ! ! • Make you love Data! Individuals and interactions over processes and tools ! • Motivations for providing and collecting data! ! • Monitoring user stories and practices! ! • Getting started and open source tooling #DV14 #Monitoring @cfalguiere
  3. 3. Claude Falguiere • Devoxx4Kids! • Paris JUG, Devoxx France, Duchess! ! http://cfalguiere.wordpress.com ! • DevOps Coach ! • Java, Performance! #DV14 #Monitoring @cfalguiere
  4. 4. Monitoring What would you do if you knew that database is broken that number of hits doubles every 2 month that users struggle to find the order form why the app is slow what users want to buy #DV14 #Monitoring @cfalguiere
  5. 5. Model 138 ms 742 orders 42 users Questions Model Hypothesis Facts Data Sales increased by 14% Estimated orders next month 934 Average number of requests is 5 times the number of users #DV14 #Monitoring @cfalguiere
  6. 6. Galaxy Rotation Problem Spiral galaxies spin too fast ! ! Expected mass should be ten times the observed mass - calculated from the visible objets - to prevent galaxies from flying apart #DV14 #Monitoring @cfalguiere
  7. 7. Discovery of Dark Matter Assumes readings are wrong 1932 - 1933 1960 - 1970 2010 - 2013 Jan Oort Fritz Zwicky Hypothesis of a missing mass ?? If readings are true, is model wrong ? Mass calculated from gravitational effects and evidence of Dark Matter Plank Satellite Dark matter estimated to 84.5% of the total matter in the universe Vera Rubin #DV14 #Monitoring @cfalguiere
  8. 8. Jan Oort Fritz Zwicky • Hypothesis of a missing mass 1932 - 1933 • Data are validated • Is Universal Gravitation wrong ? #DV14 #Monitoring @cfalguiere
  9. 9. • Still speculative, but generally accepted by the mainstream scientific community Vera Rubin • Mass calculated from gravitational effects and evidence of Dark Matter 1960 - 1970 Plank Satellite • Dark matter estimated to 84.5% of the total matter in the universe 2010 - 2013 #DV14 #Monitoring @cfalguiere
  10. 10. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  11. 11. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  12. 12. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  13. 13. Measure everything Lean Startup DevOps Make decisions based on facts! Big Data #DV14 #Monitoring @cfalguiere
  14. 14. What would you do if you knew that database is broken that number of hits doubles every 2 month that users struggle to find the order form why the app is slow what users want to buy #DV14 #Monitoring @cfalguiere
  15. 15. Motivations and user stories SLA observance! Alerting Alerting Diagnosis / Post-Mortem! Capacity Planning! Improvement Storage, Visualization #DV14 #Monitoring @cfalguiere
  16. 16. Motivations and user stories SLA observance! Alerting Diagnosis / Post-Mortem! Capacity Planning! Improvement Alerting Storage, Visualization #DV14 #Monitoring @cfalguiere
  17. 17. Architecture Collector Probe App Alerting Storage, Aggregation Dev Log Parser Support Log Network System DBA Visualization #DV14 #Monitoring @cfalguiere
  18. 18. Architecture Collector Probe App Alerting Storage, Aggregation Dev Log Parser Support Log Network System DBA Visualization #DV14 #Monitoring @cfalguiere
  19. 19. Architecture Collector Probe App Alerting Storage, Aggregation Dev Log Parser Support Log Network System DBA Visualization #DV14 #Monitoring @cfalguiere
  20. 20. Collector System Collector App Log Storage MQ Storage Alerting filters rules MQ #DV14 #Monitoring @cfalguiere
  21. 21. Collector System Collector App Log Storage MQ Storage Alerting filters rules MQ #DV14 #Monitoring @cfalguiere
  22. 22. Topology App Platform App Monitoring Platform Alerting Collector! Visualization Log Parser Log Storage, Aggregation #DV14 #Monitoring @cfalguiere
  23. 23. Resilience App Platform App Monitoring Platform Alerting Collector Log Parser Log MQ Collector Storage, Aggregation Visualization MQ #DV14 #Monitoring @cfalguiere
  24. 24. What would you do if you knew that database is broken #DV14 #Monitoring @cfalguiere
  25. 25. Error detection and alerting • Log filtering ! • Event firing! ! • Context! • is it critical ?! • which feature does it impact ?! • how deep is the impact ? #DV14 #Monitoring @cfalguiere
  26. 26. Is this a log ? Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: ! Access denied for user 'shopapp'@'shprdb1' to database 'shop'! at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)! at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)! at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)! at java.lang.reflect.Constructor.newInstance(Unknown Source)! at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)! at com.mysql.jdbc.Util.getInstance(Util.java:386)! at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)! at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)! at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)! at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:928)! at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1750)! at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1290)! at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2493)! at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2526)! at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2311)! at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)! at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)! at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)! at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)! at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)! at java.lang.reflect.Constructor.newInstance(Unknown Source)! at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)! at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)! at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:347)! at java.sql.DriverManager.getConnection(Unknown Source)! #DV14 #Monitoring @cfalguiere
  27. 27. Log example 2013-12-17 05:53:16,208 ERROR [Order Creation Service](456713) [shpras2](web1234) Could not create order id=456713 - Cause: Can’t connect to database ‘shop” - MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! 2013-12-17 05:53:16,208 ! ERROR ! [Order Creation Service]! (456713) ! [shpras2]! (web1234) ! Could not create order id=456713 ! Cause: Can’t connect to database ‘shop” ! MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! #DV14 #Monitoring @cfalguiere
  28. 28. Timestamp 2013-12-17 05:53:16,208 ! ERROR ! [Order Creation Service]! (456713) ! [shpras2]! (web1234) ! Could not create order id=456713 ! Cause: Can’t connect to database ‘shop” ! MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'! Severity }Context (technical and business) { Meaningful information #DV14 #Monitoring @cfalguiere
  29. 29. Log Collectors Collector Collectd Logstash storage Log Alerting! System Flume Splunk (Commercial) #DV14 #Monitoring @cfalguiere
  30. 30. Logstash input {! file {! path => “/app/logs/apache/*.log”! type => "apachelog"! }! }! ! filter {! if [type] == "apachelog" {! grok {! pattern => “%{COMBINEDAPACHELOG}" ! }! }! }! ! output {! elasticsearch { host => localhost } ! stdout { }! } #DV14 #Monitoring @cfalguiere
  31. 31. Logstash input {! file {! path => “/app/logs/appserver/monitor*.log"! type => "applog"! }! }! ! filter {! if [type] == "applog" {! grok {! pattern => “%{TIMESTAMP_ISO8601:ts}” %{WORD}:severity …! }! }! }! ! output {! elasticsearch { host => localhost } ! stdout { }! } #DV14 #Monitoring @cfalguiere
  32. 32. Rate check • Frequency of an error increases! • Activity falls (e.g. Frequency of orders)! ! • Alerting based on threshold #DV14 #Monitoring @cfalguiere
  33. 33. Baselining 120 90 60 30 0 A B 10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10 200 150 100 50 0 D C 10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10 #DV14 #Monitoring @cfalguiere
  34. 34. What would you do if you knew that number of hits doubles every 2 month 120 90 60 30 0 Jan Feb Mar Apr May Jun Jul Aug #DV14 #Monitoring @cfalguiere
  35. 35. Graphers 70 52,5 35 17,5 0 30 22,5 15 7,5 0 • Foresight Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec • Cycles Sun Tue Thu Sat Mon Wed Wed 40 30 20 10 0 • Correlation Sun Tue Thu Sat Mon WedWed • Distribution 100 75 50 25 0 April May June July #DV14 #Monitoring @cfalguiere
  36. 36. Storage / Visualization Collectors (Collectd / Statd / Logstash / Flume) Plain REST Graphite docker: lopter/collectd-graphite #DV14 #Monitoring @cfalguiere
  37. 37. Collect and Share Collect Once and Share! • Support, ! • Ops, Dev! • Business! ! UpToDate! Flexible! ! #DV14 #Monitoring @cfalguiere
  38. 38. Storage / Visualization Collectors (Collectd / Statd / Logstash / Flume) Plain REST REST REST Graphite InfluxDB Grafana docker: gsogol/docker-elk Logstash ElasticSearch Kibana #DV14 #Monitoring @cfalguiere
  39. 39. JMX source: wikipedia • MBeans! • Registration! • Servo! • RMI and firewalls! • -Dcom.sun.management.jmxremote.rmi.port=p! • -Djava.rmi.server.hostname=n.n.n.n! • Jolokia! • jmxtrans! ! #DV14 #Monitoring @cfalguiere
  40. 40. JMX Collectors storage Collector logstash collectd JMX beans VisualVM! JConsole JMX Enabled! ! App Performance Monitoring tools #DV14 #Monitoring @cfalguiere
  41. 41. JSON Event over REST curl -X POST “…” ! Timestamp -d '{"ts": "2013-12-17 05:53:16,208", ! ! "type": “metric”, ! ! “module”: “Order Creation Service”, ! ! “module-id”: “456713”, ! ! “instance”: “shpras2”, ! ! “thread”: “web1234”, ! “name”: “order-creation”,! ! “duration”: “12”, ! ! “unit”: “ms”} } Context (technical and business) } Metric) #DV14 #Monitoring @cfalguiere
  42. 42. What would you do if you knew why app is slow #DV14 #Monitoring @cfalguiere
  43. 43. Tuning • Collectd/Statd plugins! • Metrics ! • Commercial : Plumbr, AppDynamics, New Relics! ! ! Where does it spend time ?! Why ? cross-check metrics from various sub-systems Front-End Back-End System DB System System #DV14 #Monitoring @cfalguiere
  44. 44. What would you do if you knew that users struggle to find the order form #DV14 #Monitoring @cfalguiere
  45. 45. Web Analytics / User tracking • Web analytics! • Page counters! • Tagging! • Log parser! ! • Google Analytics! • Piwik (docker: cfalguiere/docker-piwik) • Reporting APIs #DV14 #Monitoring @cfalguiere
  46. 46. What would you do if you knew what users want to buy #DV14 #Monitoring @cfalguiere
  47. 47. Model vs Big Data • Expected information! • Explicit Model! • List of metrics • Classification! • Machine Learning! • Patterns detection! Highlights valuable metrics and relationships #DV14 #Monitoring @cfalguiere
  48. 48. Getting started List user stories and metrics setup monitoring get facts get add metrics hypothesis validate hypothesis get facts #DV14 #Monitoring @cfalguiere
  49. 49. What should I monitor ? Alerting & Post-Mortem :! Presence check Activity (how many users, requests, orders …) Ressources that are limited in size Physical : CPU, memory, free disk space, network bandwidth ... Logical : pools, queues, caches, … Errors Others #DV14 #Monitoring @cfalguiere
  50. 50. What should I monitor ? Plan & Improve :! Any information which is useful to understand the process time spent for each major step things that are done often or requires large datasets user navigation context Listen to users and ops #DV14 #Monitoring @cfalguiere
  51. 51. Learn from data Continuous Improvement Design for Failure #DV14 #Monitoring @cfalguiere
  52. 52. Thank You #DV14 #Monitoring @cfalguiere

Slidedeck of the talk Devoxx 2014

Views

Total views

833

On Slideshare

0

From embeds

0

Number of embeds

28

Actions

Downloads

19

Shares

0

Comments

0

Likes

0

×