• Save
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Upcoming SlideShare
Loading in...5
×
 

Blueflood: Open Source Metrics Processing at CassandraEU 2013

on

  • 1,486 views

Describes how Blueflood works and future development direction

Describes how Blueflood works and future development direction

Statistics

Views

Total Views
1,486
Views on SlideShare
1,429
Embed Views
57

Actions

Likes
3
Downloads
0
Comments
0

1 Embed 57

https://twitter.com 57

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Blueflood: Open Source Metrics Processing at CassandraEU 2013 Blueflood: Open Source Metrics Processing at CassandraEU 2013 Presentation Transcript

  • Blueflood Simple Metrics Processing Gary Dusbabek • Cassandra EU 2013
  • Motivation Building Blocks Future Future Stuff
  • Motivation
  • Get     the Data   In
  • Each check generates 2-20 metrics Multiply by data centers
  • Currently handling 120 million metrics per hour
  • 40 million aggregate Cassandra write operations per hour
  • Get  the  Data  Out   Fast  Graphs!   Think:  Dashboards   SLA  is  important  
  • Get  the  Data  Out   Get     the Data   Out Fast  Graphs!   Think:  Dashboards   SLA  is  important  
  • Get  the  Data  Out   Fast  Graphs!   Think:  Dashboards   SLA  is  important   Fast Graphs
  • Multitenant
  • Different SLAs expectations
  • Hard
  • Tenants imply Metadata
  • Hampers generic computing Systems
  • Lipstick system
  • Nice to Have
  • Not Mission Critical
  • Don’t Break the Bank
  • Avoid Hadoop
  • HATE Hadoop
  • HATE Hadoop
  • We Ended Up With This Ingestion API Ingestion Transform Query API Metadata + Cache Rollup Scheduler State Management Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • We Ended Up With This Ingestion API Ingestion Transform Query API Metadata + Cache Rollup Scheduler State Management Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • Cassandra Database (Cassandra)
  • Cassandra 1.0, 1.1, 1.2 Compatible No 2.0 yet  
  • Cassandra Experimented with CQL very early on CQL 1.0 time frame  
  • Cassandra Experimented with CQL very early on CQL 1.0 time frame  
  • Cassandra Astyanax now Mostly happy with it Connection pool implementation is very sensitive to network bumps
  • Cassandra Experimented with various compaction strategies No real winner Leveldb bugs in 1.0 made it almost a non-starter
  • Cassandra CASSANDRA-5685 Per-CF TTLs Doesn’t help us Might help you
  • Cassandra CASSANDRA-3974 TTL histogram used to give input on which sstables are good candidates for compaction (size-tiered only)
  • Cassandra CASSANDRA-5228 Track max TTL per sstable to expire the whole thing. We could use this by using bucketed CFs
  • Anatomy of a Metric One dimensional signal Has an ID We call this a locator Mostly opaque Tuple of (tenantId [,other things,…]
  • Anatomy of a Metric Example: 6335,web01,ping,bytes
  • Anatomy of a Metric Stuff whatever you want in there Just don’t change it It becomes a key
  • Anatomy of a Metric Has a type associated with it: long, double, string, boolean Type determines on-disk serialization
  • ! {! "timestamp": 1319222001982,! "monitoring_zone_id": "mzXXXXXXXX",! "available": true,! "status": "code=200,rt=0.257s,bytes=0",! "metrics": {! "bytes": {! "type": "i",! "data": "0"! },! "tt_firstbyte": {! "type": "I",! "data": "257"! },! "tt_connect": {! "type": "I",! "data": "128"! },! "code": {! "type": "s",! "data": "200"! },! "duration": {! "type": "I",! "data": "257"! }! }! Example }!
  • Anatomy of a Metric Sometimes has units Example: seconds, bytes, light years We guess on this
  • Column Families Metrics Full resolution One per granularity (5m, 20m, 60m, 240m, 1440m) One row per metric Locator is the key
  • Column Families Metrics No Bucketing Will be required for high frequency metrics Solution is easy Just complicates Locator resolution
  • Column Families Metadata One row per metric Rollup State Nasty map for tracking shard state Active Metrics Shard to list of locators
  • Column Families STRING & BOOLEAN Speshul Only updated when values change Plumbing keeps old values in memory
  • Libraries Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • Ingestion LibrarY insert_metrics(list<metric>)!
  • Ingestion LibrarY update_state(shard, granularity, slot)! SLOT == Bucket of time  
  • Rollup LibrarY get_active_locators(shard)! get_state(shard, granularity, slot)! get_metrics(from, to, locator, granularity)! write_rollups(list<rollup>)! update_state(shard, granularity, slot)!  
  • Rollup LibrarY Supports bulk operations outside of the service Enables tools to be written  
  • Rollup LibrarY Rollups contain count, min, max, mean, variance Serialization is versioned  
  • Query LibrarY get_data(form, to, granularity)! get_data(from, to, num_points)!  
  • Metadata & Cache Metadata + Cache State Management Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • Metadata & Cache Integrated into services (ingestion & rollup) Backed by Cassandra Supports different eviction strategies based on needs
  • Metadata & Cache Example 1: TTLs are linked to tenants and are not known when metrics are ingested A separate API must be consulted
  • Metadata & Cache Example 2: Units are valuable only at query time, but are not included with metrics Heuristically guess and store these
  • Rollup Schedule Service Metadata + Cache Rollup Scheduler State Management Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • Rollup Schedule Service Problem: Divide time into buckets without scratching at infinity Identify them using a finite set of keys
  • Rollup Schedule Service Solution: Order preserving consistent hashing for timestamps
  • Rollup Schedule Service Imagine a two week period divided into slots the size of each granularity
  • Rollup Schedule Service 4032 5m slots 1008 20m slots 336 60m slots 84 240m slots 14 1440m slots
  • Rollup Schedule Service Gives us a way of consistently addressing and bucketing time ranges As time increases, so does the slot it hashes to (until it wraps to zero)
  • Rollup Schedule Service When do we roll up? Whenever an active slot a) has not been updated in N seconds b) is M seconds old
  • Rollup Schedule Service What about late data? Late data can be ingested for 24 hours
  • Ingestion  Processors Ingestion Transform Metadata + Cache Rollup Scheduler State Management Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • Ingestion  Processors Every metric is not built the same way They come from different places Processors allow you to make them consistent Can be synchronous or asynchronous
  • API Endpoints Ingestion API Ingestion Transform Query API Metadata + Cache Rollup Scheduler State Management Java Ingestion Library Java Rollup Library Database (Cassandra) Java Query Library
  • API Endpoints Why not ship it with API endpoints? External forces
  • API Endpoints Decided to make them Modular
  • API Endpoints We do ship reference API endpoints UDP Ingestion HTTP Ingestion HTTP Query
  • API Endpoints Downside? More work for you
  • API Endpoints Upside? We ♥ Pull Requests
  • How Does It Scale? Ingestion scales linearly Add ingestion nodes until Cassandra is the bottleneck
  • How Does It Scale? Two ingestors per DC Only one per DC is active Double ingest
  • How Does It Scale? Rollups scale [almost] linearly by spreading out shard ownership Shards are currently pegged at 128 Ok to have multiple nodes own a shard Zookeeper is a soft-dependency
  • Future Stuff Local ingestion durability
  • Future Stuff Richer metadata API Example: tag metrics and then use those tags as a query facet Will require an index Experimenting with ElasticSearch Home-rolled bitmap indexes
  • Future Stuff Pre-aggregated Metrics Histograms (partially implemented) Counters, Timers, Gauges, Sets
  • Future Stuff Deep statsd and graphite integration (active work) Statsd is hard because counts get reset after a flush
  • Future Stuff Graphite is just involved (new rollup types) Whisper DB interface Then hack carbon to support it Already pluggable, just needs integration
  • Thanks! http://blueflood.io blueflood-discuss@googlegroups.com Freenode: #blueflood gitub.com/rackerlabs/blueflood Twitter: @gdusbabek
  • Image Credits All images for this presentation come from the Flickr commons collection http://www.flickr.com/commons/ flood guide motivation cows jet apartments groups lipstick elephant containers anatomy columns library cache money railyard processors terminal fish future thanks http://www.flickr.com/photos/keenepubliclibrary/2593172720/sizes/z/ http://www.flickr.com/photos/field_museum_library/3796303860/ http://www.flickr.com/photos/statelibraryofnsw/4944459226/sizes/l/in/photolist-8wVDt1/ http://www.flickr.com/photos/nationalarchives/7457004362/sizes/l/ http://www.flickr.com/photos/sdasmarchives/4564334397/sizes/o/ http://www.flickr.com/photos/nypl/3110619126/sizes/o/ http://www.flickr.com/photos/fylkesarkiv/4545544268/sizes/l/ http://www.flickr.com/photos/library_of_congress/2179918784/sizes/o/ http://www.flickr.com/photos/statelibraryofnsw/2963006536/sizes/o/ http://www.flickr.com/photos/smu_cul_digitalcollections/9526924556/sizes/l/ http://www.flickr.com/photos/usnationalarchives/5573758997/sizes/l/ http://www.flickr.com/photos/cornelluniversitylibrary/3485933761/sizes/l/ http://www.flickr.com/photos/statelibraryofnsw/4414971043/sizes/l/ http://www.flickr.com/photos/smu_cul_digitalcollections/8519861690/sizes/l/ http://www.flickr.com/photos/nlireland/8443250313/sizes/h/ http://www.flickr.com/photos/national_library_of_australia_commons/6174084474/sizes/l/ http://www.flickr.com/photos/nypl/3110609190/sizes/o/ http://www.flickr.com/photos/hartlepool_museum/4398630456/sizes/o/ http://www.flickr.com/photos/usnationalarchives/7158774350/sizes/l/ http://www.flickr.com/photos/nlireland/9490851253/sizes/l/