Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time series databases

545 views

Published on

Slides from a talk delivered during 4Developers conference in Warsaw covering concepts of time-series databases.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Time series databases

  1. 1. @SrcMinistry @MariuszGil Time series databases Data processing
  2. 2. We are developers
  3. 3. Code
  4. 4. Write it
  5. 5. Test it
  6. 6. Release it
  7. 7. Where does our responsibility end?
  8. 8. Devs vs SysAdmins
  9. 9. No man’s land
  10. 10. Context…
  11. 11. The story
  12. 12. One app 10+ servers
 Millions of users
  13. 13. Infra monitoring
  14. 14. Pro blems
  15. 15. What’s going on?
  16. 16. You can't manage what you can't measure W. Edwards Deming
  17. 17. Gather
  18. 18. 2016-04-11 02:00:00 45 2016-04-11 03:00:00 42 2016-04-11 04:00:00 30 2016-04-11 05:00:00 46 2016-04-11 06:00:00 70 2016-04-11 07:00:00 120
  19. 19. Store
  20. 20. Present
  21. 21. React
  22. 22. Low resolution No context
  23. 23. Metrics
  24. 24. Metrics, values that change over time Matches everything
  25. 25. What should be measured?
  26. 26. Everything what is important for you
  27. 27. Tech CPU / PV / reqs / resp_time
  28. 28. Business KPIs / PV / UU
  29. 29. Why it should be measured?
  30. 30. Insights
  31. 31. Time series databases
  32. 32. Software system that’s optimized for handling arrays of numbers indexed by time Wikipedia definition
  33. 33. Many solutions…
  34. 34. Hello world
  35. 35. CREATE DATABASE test
  36. 36. SHOW DATABASES name: databases --------------- name _internal test
  37. 37. USE test Using database test
  38. 38. INSERT clicks,type=cpc,source=ad-partner-1,paid=yes,widget=promo-box value=0.43 INSERT clicks,type=cpc,source=ad-partner-2,paid=yes,widget=promo-box value=0.50 INSERT clicks,type=cpc,source=ad-partner-1,paid=yes,widget=top value=0.62 INSERT clicks,type=cpm,source=ad-partner-1,paid=yes,widget=top value=0.80
  39. 39. SELECT * FROM clicks name: clicks ------------ time paid source type value widget 1459850636770890539 yes ad-partner-1 cpc 0.43 promo-box 1459850643866389522 yes ad-partner-2 cpc 0.50 promo-box 1459850656407067481 yes ad-partner-1 cpc 0.62 top 1459850668781668282 yes ad-partner-1 cpm 0.80 top
  40. 40. SELECT COUNT(value), MEAN(value), MEDIAN(value) FROM clicks WHERE time > NOW() - 2m GROUP BY time(5s) name: clicks ------------ time count mean median 1459851275000000000 0 1459851280000000000 0 1459851285000000000 3 0.330 0.33 1459851290000000000 0 1459851295000000000 2 0.420 0.420 1459851300000000000 2 0.365 0.365 1459851305000000000 3 0.446 0.43 1459851310000000000 0 1459851315000000000 0
  41. 41. SELECT COUNT(value), MEAN(value), MEDIAN(value) FROM clicks WHERE time > NOW() - 2m GROUP BY time(5s) FILL(0) name: clicks ------------ time count mean median 1459851275000000000 0 0 0 1459851280000000000 0 0 0 1459851285000000000 3 0.330 0.33 1459851290000000000 0 0 0 1459851295000000000 2 0.420 0.420 1459851300000000000 2 0.365 0.365 1459851305000000000 3 0.446 0.43 1459851310000000000 0 0 0 1459851315000000000 0 0 0
  42. 42. Functions Aggregrations COUNT DISTINCT INTEGRAL MEAN MEDIAN SPREAD SUM Selectors BOTTOM FIRST LAST MAX PERCENTILE TOP Transformations CEILING DERIVATIVE DIFFERENCE FLOOR HISTOGRAM MOVING_AVERAGE STDDEV
  43. 43. Real world
  44. 44. Writing data
  45. 45. App HTTP API Graphite protocol Collectd protocol JSON+UDP
  46. 46. curl -i -XPOST 'http://localhost:8086/write?db=test' --data-binary 'cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000’ HTTP/1.1 204 No Content Request-Id: 02e7757f-fb1e-11e5-809d-000000000000 X-Influxdb-Version: 0.11.0 Date: Tue, 05 Apr 2016 11:03:22 GMT
  47. 47. curl -G 'http://localhost:8086/query' --data-urlencode "db=test" --data-urlencode "q=SELECT value FROM cpu_load_short WHERE region='us-west'" {"results":[{"series":[{"name":"cpu_load_short","columns":["time","value"],"values": [["2015-06-11T20:46:02Z",0.64]]}]}]}
  48. 48. // create an array of points
 $points = array(
 new Point(
 'test_metric', // name of the measurement
 0.64, // the measurement value
 ['host' => 'server01', 'region' => 'us-west'], // optional tags
 ['cpucount' => 10], // optional additional fields
 1435255849 // Time precision has to be set to seconds!
 ),
 new Point(
 'test_metric', // name of the measurement
 0.84, // the measurement value
 ['host' => 'server01', 'region' => 'us-west'], // optional tags
 ['cpucount' => 10], // optional additional fields
 1435255849 // Time precision has to be set to seconds!
 )
 );
 
 // we are writing unix timestamps, which have a second precision
 $result = $database->writePoints($points, Database::PRECISION_SECONDS);
  49. 49. Data model
  50. 50. $statsd = new LeagueStatsDClient();
 
 $statsd->increment('web.pageview');
 $statsd->decrement('storage.remaining');
 $statsd->increment(array(
 'first.metric',
 'second.metric'
 ), 2);
 $statsd->increment('web.clicks', 1, 0.5); Counters
  51. 51. $statsd = new LeagueStatsDClient();
 
 $statsd->gauge('api.logged_in_users', 123456); Gauges
  52. 52. $statsd = new LeagueStatsDClient();
 
 $statsd->timing('api.response_time', 23.1); Timers
  53. 53. $statsd = new LeagueStatsDClient();
 
 $userID = 23;
 
 $statsd->set('api.unique_logins', $userID); Sets
  54. 54. Data volume
  55. 55. CREATE CONTINUOUS QUERY <cq_name> ON <database_name> [RESAMPLE [EVERY <interval>] [FOR <interval>]] BEGIN SELECT <function>(<stuff>)[,<function>(<stuff>)] INTO <different_measurement> FROM <current_measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<stuff>] END
  56. 56. Rsyslog
  57. 57. Rsyslog
  58. 58. Rsyslog
  59. 59. Possibilities!
  60. 60. App heart beat
  61. 61. Anomaly detection
  62. 62. Monitoring with existing stack
  63. 63. Alternatives
  64. 64. Gather. Store. Present. React.
  65. 65. No rocket science
  66. 66. Only rocket fuel
  67. 67. @SrcMinistry Thanks! @MariuszGil

×