Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tick Stack - Listen your infrastructure and please sleep

288 views

Published on

Our application and our infrastructure speak, time series are one of their languages, during this talk I will share my experience about InfluxDB and time series to monitor and know the status of our cloud infrastructure. We will show best practice and tricks to grab information from an application in order to understand the mains difference between logs and time series.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tick Stack - Listen your infrastructure and please sleep

  1. 1. Listen to you infrastructure and please sleep
  2. 2. Hello! I am Gianluca Arbezzano Software Engineer at CurrencyFair @gianarb on Twitter and GitHub 2
  3. 3. 3 Open Source Maintainer and Contributors penny.gianarb.it vim-php.org
  4. 4. 4 scaledocker.com Some resources about Docker
  5. 5. “ “Try again. Fail again. Fail better.” cit. Samuel Beckett 5
  6. 6. Trust your system To be familiar with your applications you need to know what they are doing. 6
  7. 7. To predict the future
  8. 8. Because we are not John
  9. 9. 1 Widespread Monitoring Tools tail -f /var/log/app.log 9
  10. 10. 10
  11. 11. 11 2016/04/15 15:42:46 [warn] 2330#0: *167 using uninitialized variable, client: 10.0.1.1, server: localhost.dev, request: "POST /auth HTTP/1.1", host: "localhost" 2016/04/15 15:44:44 [error] 2330#0: *171 FastCGI sent in stderr: " PHP message: PHP Fatal error: Uncaught exception 'RuntimeException' with message 'All broken)[500]' in /var/www/my/project.php:237 Stack trace: #0 /var/www/index.php:45 ObjectService->flush() #1 [internal function] ->save()
  12. 12. 2 Expensive to store 12
  13. 13. 2 Difficult to index 13
  14. 14. Difficult not impossible They do an amazing work. And there are other tools!! 14
  15. 15. They are awesome for some use cases ▪ Extract informations ▪ They can be “human readable” ▪ … and others, and others 15
  16. 16. Keep your life amazing Reduce your time series to a timestamp and a value (int or float) 16 This is cheap and useful
  17. 17. We are here to speak about Time Series [ { "name": "log_lines", "columns": ["time", "line"], "point": [1400425947368, "here's some useful log info"] } ] 17
  18. 18. EASY! EASY! EASY! { "name": "cpu_percent_use", "columns": ["value"], "point": 40 } 18
  19. 19. “ “Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage to move in the opposite direction.” Cit. Ernest F. Shumacher 19
  20. 20. Time is a perfect sharding key It means that Time Series scale really well 20
  21. 21. We have a set of tools to use 21
  22. 22. 22 InfluxDB ● Optimized to store time series data ● Open Source and easy to install (go binary) ● Big community and ecosystem to manage alert and collect metrics
  23. 23. 23 Easy Install and start a go binary wget https://dl.influxdata.com/influxdb/releases/influxdb_1.2.2_amd64.deb sudo dpkg -i influxdb_1.2.2_amd64.deb Influxd -config /usr/local/etc/influxdb.conf
  24. 24. 24 Easy HTTP API on port 8086 Support for UDP connections Powerful CLI to communicate with the db
  25. 25. 25 Easy SELECT value FROM cpu_load_short WHERE region='us-west'
  26. 26. 26
  27. 27. 27
  28. 28. 28 [key] [fields] [timestamp] temperature,machine=unit internal=3,external=10 1434055562000000035 Inline Protocol thinked to be smart and slim
  29. 29. 29 CorleyBenchmarksInfluxDBAdapterEvent Method Name Iterations Average Time Ops/second ------------------------ ------------ -------------- ------------- sendDataUsingHttpAdapter: [1,000 ] [0.0026700308323] [374.52751] sendDataUsingUdpAdapter : [1,000 ] [0.0000436344147] [22,917.69026] UDP vs TCP protocol
  30. 30. 30 CREATE CONTINUOUS QUERY minnie ON world BEGIN SELECT min(mouse) INTO min_mouse FROM zoo GROUP BY time(30m) END Continuous Query
  31. 31. Telegraf https://github.com/influxdata/telegraf Collector to grab and send data from different sources to InfluxDB and other databases 31 Based on Input and out Plugin System
  32. 32. 32 Telegraf Plugins
  33. 33. Kapacitor https://github.com/influxdata/kapacitor Trigger notifications and make action in case of specific behaviors 33 framework for processing, monitoring, and alerting on time series data
  34. 34. 34 Kapacitor high CPU alert stream |from() .measurement('cpu_usage_idle') .groupBy('host') |window() .period(1m) .every(1m) |mean('value') |eval(lambda: 100.0 - "mean").as('used') |alert() .message('{{ .Level}}: {{ .Name }}/{{ index .Tags "host" }} has high cpu: {{ index .Fields "used" }}') .warn(lambda: "used" > 70.0) .crit(lambda: "used" > 85.0) // Send alert to hander of choice. // Slack .slack() .channel('#alerts') // PagerDuty .pagerDuty()
  35. 35. 35 When you start to work with "micro"services understand the topology of your connections is really important Time series can help you
  36. 36. “ demo 36
  37. 37. 37 Use a dashboard to put together different metrics and create not obvious relations
  38. 38. Badoo migrated to PHP 7
  39. 39. @dgryski tested for 10 mins a service in Golang (previously it was in Perl)
  40. 40. Sometimes you just need to compare
  41. 41. 41 Why InfluxDB and not something else? https://www.influxdata.com/influxdb-is-27x-faster-vs-mongodb-for-time-series-workloads/ 27x greater write throughput 84x less disk space
  42. 42. 42 That’s it! A series of great tools to monitor your applications and your infrastructure
  43. 43. 43 A monitoring system isn’t for all
  44. 44. “Anybody” who?!? 44 ● Think about your monitoring system as “as a service” tools. ● different location (VPC or network) ● Proper team, contingency plan, deploy, documentation, everything you can! ● HA
  45. 45. Thanks! See you around twitter.com/gianarb 45

×