Building a custom time series db - Colin Hemmings at #DOXLON

www.dataloop.io | @dataloopio | info@dataloop.io
Colin Hemmings | Architect
Time-series Datastore
on Riak

•Collection
•Storage
•Analytics
Architecture

Just stick it in a database, right?
The Storage Problem

Past Solutions
TempoDB - the phantom menace

Past Solutions
MongoDB - return of the Jedi

Riak - Our New Hope
• Scales
• Ops Friendly
• Actually works
• No random JVM crashes here

Objectives
• Handle the load
• Semi-arbitrary queries
• Data retention windows
• Low latency

Data structure
• Resolution/rollup based queries
• Minimum 24 hours at 1 second resolution
• Second, minute and hour resolution

Data structure
• 86,400 data points per resolution
• 1 second -> 24 hour retention
• 1 minute -> 60 day retention
• 1 hour -> 10 year retention

Data structure
• per metric -> 250k data points
• 1000 metric per host -> 2.5M data points
• 300 hosts per user -> 750M data points
• 1000 customers -> 750B data points!!!!!

Simple Riak Storage
• Timestamp keyed object per metric value
• 2i and MapReduce are too slow
• Especially across millions of keys
• Writes would soon cripple our Riak cluster

Intelligent Riak Storage
• Units of storage: time based data blocks
• Compute keys
• Mutable data windows

Query
Get cpu metrics for host A for period t1-t4 at 1 second resolution
• Pull the correct blocks from riak, based on block boundaries
• GET /buckets/host_a/keys/cpu_second_t1b

Query
• Filter points outside of our query range
• Aggregate all the data points
• Perform other operation if more complex query

Expiring
• Cleanup worker
• Removes keys out of retention window
• Host keyed, easier to clear all hosts or account data

Our cluster
• Riak 2.0
• 5 nodes on LevelDB
• Each 2 x 500GB striped SSDs
• Average 1ms GET and PUT latencies

Comments
• Awesome, especially for ops
• A bit more work in application tier
• Always compute keys avoid 2i and MapReduce
• Looking forward to using the new data types

Building a custom time series db - Colin Hemmings at #DOXLON

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building a custom time series db - Colin Hemmings at #DOXLON

Similar to Building a custom time series db - Colin Hemmings at #DOXLON (20)

More from Outlyer

More from Outlyer (20)

Recently uploaded

Recently uploaded (16)

Building a custom time series db - Colin Hemmings at #DOXLON