Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012


Published on

Published in: Technology
  • Be the first to comment

Timeseries data in Riak - Riak Meetup Stockholm 1/11/2012

  1. 1. Metrics with Riak A retrospective Martin Törnwall
  2. 2. Metrics?Many definitions, but heres ours...
  3. 3. Recording thingsthat change over time So we can visualize it and search for patterns
  4. 4. OSCPU, network, memory and disk usage, ...
  5. 5. ApplicationNumber of requests, errors, events, ...
  6. 6. External eventsText messages or emails sent, customer service calls, ...
  7. 7. What is a Metric?● A named variable: "sys.mem.free"● With tags: "host=sl075", "code=403", ... avg("sys.mem.free") from 1 hour ago where host="sl075"
  8. 8. Going Technical
  9. 9. We have distributed servicesWhy not have distributed metrics?
  10. 10. Reinventing the wheel?Solutions exist, but rely on technology stacks we had no experience of (e.g., HBASE)
  11. 11. I mean, really...Just how hard can it be?
  12. 12. I mean, really...Just how hard can it be?
  13. 13. Introducing MetyrOur weekend hack glorious metrics storage and processing software
  14. 14. Design Decisions● Use familiar tools: Erlang, Riak, HTTP● Not a critical service but ...● ... Avoid SPOF● Write performance >> read performance● Centralized reference clock● Integer only● Avoid 2i if possible● When in doubt, leave it to Riak
  15. 15. In Theory... Client Client Client Metyr Metyr Metyr Riak cluster
  16. 16. Storing metrics in RiakNo SQL, no schemas, no indices (?), no aggregate operations
  17. 17. Attempt 1The naïve way just never works...
  18. 18. Make each sample an objectA bucket per metric; index by Epoch time
  19. 19. The Good™Atomicity, write-once, fast range queries
  20. 20. The BadSlow, large overhead, requires 2i
  21. 21. Attempt 2Combine samples into chunks by time
  22. 22. Key Points● One bucket per metric as before● Split into hour-sized chunks (configurable)● Chunk key: Epoch time● Chunk value: List of samples● To read: Fetch chunks within interval● To write: Fetch chunk, add sample, write back
  23. 23. Chunk Anatomy One sample Time0 Value0 Tags0... ... TimeN ValueN TagsN... 64 bits 64 bits
  24. 24. Writing just got harderSlower since we must fetch a chunk first; potential race conditions, ...
  25. 25. (Arbitrary) Goal: Write 1K samples/secTests showed that the solution described so far was inadequate
  26. 26. Buffer them writesKeep per-metric write buffers, flushed every 10 seconds or so
  27. 27. Some Remaining Issues● Race condition on write● Storage requirements● Downsampling of old data
  28. 28. Thank you!