Surfing the event stream

1,417 views
1,331 views

Published on

As presented at GeeCon 2013.

We have lots of information available about our systems. CPU, disk IO, orders placed, error rates, users logged in. But typically all these pieces of information are collected, aggregated and stored in very different ways making correlation difficult and increasing the operational overhead of our systems. What if we could treat all of this information as events? What if we could aggregate, store, and report on all of this information as a uniform event stream? This talk will look at emerging trends in the space of log aggregation, monitoring and event streaming to paint a picture for how you too can start to make real use of the information already available to you using nothing more complex than some free, off the shelf Open Source software.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,417
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
21
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Surfing the event stream

  1. 1. @samnewman#geecon Surfing The Event Stream Sam Newman ThoughtWorks Sunday, 21 July 13
  2. 2. @samnewman#geecon Sunday, 21 July 13
  3. 3. @samnewman#geecon Sunday, 21 July 13
  4. 4. @samnewman#geecon Operational Data Sunday, 21 July 13
  5. 5. @samnewman#geecon Operational Data CPU Sunday, 21 July 13
  6. 6. @samnewman#geecon Operational Data CPU Memory Use Sunday, 21 July 13
  7. 7. @samnewman#geecon Operational Data CPU Memory Use Threads Sunday, 21 July 13
  8. 8. @samnewman#geecon Operational Data CPU Disk IO Memory Use Threads Sunday, 21 July 13
  9. 9. @samnewman#geecon Collection & Display • sar • syslog • collectd • syslog-ng • nagios • ganglia Sunday, 21 July 13
  10. 10. @samnewman#geecon Server Server Server Server Sunday, 21 July 13
  11. 11. @samnewman#geecon Server Server Server Server Sunday, 21 July 13
  12. 12. @samnewman#geecon Server Server Server Server Sunday, 21 July 13
  13. 13. @samnewman#geecon Server Server Server Server Sunday, 21 July 13
  14. 14. @samnewman#geecon Business Data Sunday, 21 July 13
  15. 15. @samnewman#geecon Business Data Orders Placed Sunday, 21 July 13
  16. 16. @samnewman#geecon Business Data Orders Placed Revenue Sunday, 21 July 13
  17. 17. @samnewman#geecon Business Data Orders Placed Revenue Fraud Cases Sunday, 21 July 13
  18. 18. @samnewman#geecon Business Data Orders Placed Bounce Rate Revenue Fraud Cases Sunday, 21 July 13
  19. 19. @samnewman#geecon How did we handle them? • Google Analytics • Data Warehouse Systems • Log files! Sunday, 21 July 13
  20. 20. @samnewman#geecon Something Happened! Sunday, 21 July 13
  21. 21. @samnewman#geecon Something Happened! What Should We Do? Sunday, 21 July 13
  22. 22. @samnewman#geecon Something Happened! What Should We Do? Sunday, 21 July 13
  23. 23. @samnewman#geecon Something Happened! What Should We Do? Sunday, 21 July 13
  24. 24. @samnewman#geecon Sunday, 21 July 13
  25. 25. @samnewman#geecon http://blog.jgc.org/2006/05/what-slashdot-effect-looks-like.html Sunday, 21 July 13
  26. 26. @samnewman#geecon Sunday, 21 July 13
  27. 27. @samnewman#geecon Fast Sunday, 21 July 13
  28. 28. @samnewman#geecon Fast And Easy... Sunday, 21 July 13
  29. 29. @samnewman#geecon Fast And Easy... At Scale Sunday, 21 July 13
  30. 30. @samnewman#geecon Aggregation Is Key Sunday, 21 July 13
  31. 31. @samnewman#geecon Mark McGranaghan: "Logs as Data" http://blip.tv/clojure/mark-mcgranaghan-logs-as-data-5953857 Sunday, 21 July 13
  32. 32. @samnewman#geecon Paul Ingles: "Users as Data" http://vimeo.com/45136211 Sunday, 21 July 13
  33. 33. @samnewman#geecon Log Stash + Graylog2 Sunday, 21 July 13
  34. 34. @samnewman#geecon Log Stash + Graylog2 Sunday, 21 July 13
  35. 35. @samnewman#geecon Log Stash + Graylog2 Sunday, 21 July 13
  36. 36. @samnewman#geecon Log Stash + Graylog2 Sunday, 21 July 13
  37. 37. @samnewman#geecon Sunday, 21 July 13
  38. 38. @samnewman#geecon Graphite Sunday, 21 July 13
  39. 39. @samnewman#geecon Sunday, 21 July 13
  40. 40. @samnewman#geecon Sunday, 21 July 13
  41. 41. @samnewman#geecon www01.cpuUsage 42 1286269200 Sunday, 21 July 13
  42. 42. @samnewman#geecon Sunday, 21 July 13
  43. 43. @samnewman#geecon Sunday, 21 July 13
  44. 44. @samnewman#geecon Sunday, 21 July 13
  45. 45. @samnewman#geecon Sunday, 21 July 13
  46. 46. @samnewman#geecon ??? Sunday, 21 July 13
  47. 47. @samnewman#geecon Sunday, 21 July 13
  48. 48. @samnewman#geecon Sunday, 21 July 13
  49. 49. @samnewman#geecon Graphite Sunday, 21 July 13
  50. 50. @samnewman#geecon Graphite Server collectd Sunday, 21 July 13
  51. 51. @samnewman#geecon Graphite AppServer collectd Sunday, 21 July 13
  52. 52. @samnewman#geecon Graphite App Server Server collectd Sunday, 21 July 13
  53. 53. @samnewman#geecon Graphite App Server Server collectd Yammer Metrics Sunday, 21 July 13
  54. 54. @samnewman#geecon Graphite App Server Server collectd Yammer Metrics Sunday, 21 July 13
  55. 55. @samnewman#geecon Volume! Sunday, 21 July 13
  56. 56. @samnewman#geecon Aggregation! Sunday, 21 July 13
  57. 57. @samnewman#geecon www01.cpuUsage 42 1286269200 Sunday, 21 July 13
  58. 58. @samnewman#geecon orderplaced 1 1286269200 Sunday, 21 July 13
  59. 59. @samnewman#geecon orderplaced 1 1286269200 orderplaced 1 1286269200 Sunday, 21 July 13
  60. 60. @samnewman#geecon orderplaced 1 1286269200 orderplaced 1 1286269200 orderplaced = 1 Sunday, 21 July 13
  61. 61. @samnewman#geecon StatsD Sunday, 21 July 13
  62. 62. @samnewman#geecon Counters ordersplaced:1|c Sunday, 21 July 13
  63. 63. @samnewman#geecon timings orderduration:140|ms Sunday, 21 July 13
  64. 64. @samnewman#geecon StatsD Client Client Graphite Sunday, 21 July 13
  65. 65. @samnewman#geecon StatsD Client Client Graphite Sunday, 21 July 13
  66. 66. @samnewman#geecon StatsD Client Client Graphite Sunday, 21 July 13
  67. 67. @samnewman#geecon Riemann Sunday, 21 July 13
  68. 68. @samnewman#geecon Riemann Sunday, 21 July 13
  69. 69. @samnewman#geecon Riemann Sunday, 21 July 13
  70. 70. @samnewman#geecon Riemann Sunday, 21 July 13
  71. 71. @samnewman#geecon Riemann Client Client Graphite Sunday, 21 July 13
  72. 72. @samnewman#geecon Sunday, 21 July 13
  73. 73. @samnewman#geecon (service "api req") (percentiles 5 [0.5 0.95 0.99] index)) Sunday, 21 July 13
  74. 74. @samnewman#geecon (service "api req") (percentiles 5 [0.5 0.95 0.99] index)) Sunday, 21 July 13
  75. 75. @samnewman#geecon (def tell-ops (rollup 5 3600 (email "ops@vonbraun.mil"))) (streams (where (state "critical") tell-ops)) Sunday, 21 July 13
  76. 76. @samnewman#geecon (let[client (tcp-client :host "aggregator")] (by [:host :service] (changed :state (forward client)))) Sunday, 21 July 13
  77. 77. @samnewman#geecon Riemann Server Client Client Sunday, 21 July 13
  78. 78. @samnewman#geecon Riemann Server Client Client Riemann Server Client Client Sunday, 21 July 13
  79. 79. @samnewman#geecon Riemann Server Client Client Riemann Server Client Client Riemann Server Sunday, 21 July 13
  80. 80. @samnewman#geecon So What Do We Have? Sunday, 21 July 13
  81. 81. @samnewman#geecon Server Server Graphite Graylog 2 Server Sunday, 21 July 13
  82. 82. @samnewman#geecon Sunday, 21 July 13
  83. 83. @samnewman#geecon Sunday, 21 July 13
  84. 84. @samnewman#geecon Sunday, 21 July 13
  85. 85. @samnewman#geecon Server Server Graphite Graylog 2Dashboard A Dashboard B Dashboard C Server Sunday, 21 July 13
  86. 86. @samnewman#geecon Server Server StatsD/ Riemann Graylog 2 Graphite Dashboard A Dashboard B Dashboard C Sunday, 21 July 13
  87. 87. @samnewman#geecon http://shopify.github.io/dashing/ Sunday, 21 July 13
  88. 88. @samnewman#geecon Sunday, 21 July 13
  89. 89. @samnewman#geecon Sunday, 21 July 13
  90. 90. @samnewman#geecon Sunday, 21 July 13
  91. 91. @samnewman#geecon Sunday, 21 July 13
  92. 92. @samnewman#geecon Realtime Aggregator Sunday, 21 July 13
  93. 93. @samnewman#geecon Realtime Aggregator Sunday, 21 July 13
  94. 94. @samnewman#geecon Realtime Aggregator Sunday, 21 July 13
  95. 95. @samnewman#geecon Realtime Aggregator Data is lost! Sunday, 21 July 13
  96. 96. @samnewman#geecon Realtime Aggregator Data is lost! Sunday, 21 July 13
  97. 97. @samnewman#geecon Real-time metrics requires upfront knowledge Sunday, 21 July 13
  98. 98. @samnewman#geecon Realtime Aggregator Sunday, 21 July 13
  99. 99. @samnewman#geecon Realtime Aggregator Sunday, 21 July 13
  100. 100. @samnewman#geecon Realtime Aggregator Lossless Event Store Sunday, 21 July 13
  101. 101. @samnewman#geecon Realtime Aggregator Lossless Event Store Sunday, 21 July 13
  102. 102. @samnewman#geecon Realtime Aggregator Lossless Event Store Hadoop HBase Cassandra Sunday, 21 July 13
  103. 103. @samnewman#geecon Riemann Server Client Client Sunday, 21 July 13
  104. 104. @samnewman#geecon Riemann Server Client Client Lossless Event Store Sunday, 21 July 13
  105. 105. @samnewman#geecon Event Sourcing Sunday, 21 July 13
  106. 106. @samnewman#geecon But... Sunday, 21 July 13
  107. 107. @samnewman#geecon Realtime Aggregator Sunday, 21 July 13
  108. 108. @samnewman#geecon Lossless Event Store Realtime Aggregator Sunday, 21 July 13
  109. 109. @samnewman#geecon Can I have one view? Lossless Event Store Realtime Aggregator Sunday, 21 July 13
  110. 110. @samnewman#geecon http://nathanmarz.com/ Sunday, 21 July 13
  111. 111. @samnewman#geecon Lossless Event Store Realtime Aggregator Sunday, 21 July 13
  112. 112. @samnewman#geecon Lossless Event Store Realtime Aggregator Sunday, 21 July 13
  113. 113. @samnewman#geecon Lossless Event Store Realtime Aggregator Up to date, but only for a small window Sunday, 21 July 13
  114. 114. @samnewman#geecon Lossless Event Store Realtime Aggregator Consistent, but out of date Up to date, but only for a small window Sunday, 21 July 13
  115. 115. @samnewman#geecon Lossless Event Store Realtime Aggregator Unified Query Consistent, but out of date Up to date, but only for a small window Sunday, 21 July 13
  116. 116. @samnewman#geecon Lossless Event Store Realtime Aggregator Lambda Architecture Unified Query Consistent, but out of date Up to date, but only for a small window Sunday, 21 July 13
  117. 117. @samnewman#geecon The Future? Sunday, 21 July 13
  118. 118. @samnewman#geecon Server Server Aggregating Relay Graphite Graylog 2 Hadoop Sunday, 21 July 13
  119. 119. @samnewman#geecon Server Server Aggregating Relay Graphite Graylog 2 Hadoop Unified Query Sunday, 21 July 13
  120. 120. @samnewman#geecon Sunday, 21 July 13
  121. 121. @samnewman#geecon AllYour Data Sunday, 21 July 13
  122. 122. @samnewman#geecon AllYour Data In Realtime Sunday, 21 July 13
  123. 123. @samnewman#geecon AllYour Data In Realtime Sunday, 21 July 13
  124. 124. @samnewman#geecon Sunday, 21 July 13
  125. 125. @samnewman#geecon Find and free your data Sunday, 21 July 13
  126. 126. @samnewman#geecon Find and free your data Start simple Sunday, 21 July 13
  127. 127. @samnewman#geecon Find and free your data Start simple Create different views for different stakeholders Sunday, 21 July 13
  128. 128. @samnewman#geecon Find and free your data Start simple Create different views for different stakeholders Don’t be scared of real-time! Sunday, 21 July 13
  129. 129. @samnewman#geecon Thanks! snewman@thoughtworks.com @samnewman Sunday, 21 July 13

×