OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.
Lessons Learned from OpenTSDBOr why OpenTSDB is the way it is and how it changed iteratively tocorrect some of the mistakes made Benoît “tsuna” Sigoure firstname.lastname@example.org
Key concepts• Data Points (time, value)• Metrics proc.loadavg.1m• Tags host=web42 pool=static• Metric + Tags = Time Series• Order of magnitude: >106 time series, >1012 data points put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static
OpenTSDB @ StumbleUpon• Main production monitoring system for ~2 years• Storing hundreds of billions of data points• Adding over 1 billion data points per day• 13000 data points/s → 130 QPS on HBase• If you had a 5 node cluster, this load would hardly make it sweat
Do’s• Wider rows to seek faster before: ~4KB/row, after: ~20KB• Make writes idempotent and independent before: start rows at arbitrary points in time after: align rows on 10m (then 1h) boundaries• Store more data per KeyValue Remember you pay for the key along each value in a row, so large keys are really expensive
Don’ts• Use HTable / HTablePool in app servers asynchbase + Netty or Finagle = performance++• Put variable-length ﬁelds in composite keys They’re hard to scan• Exceed a few hundred regions per RegionServer “Oversharding” introduces overhead and makes recovering from failures more expensive
How OpenTSDB came to be the way it isQuestions:• How to store time series data efﬁciently in HBase?• How to enable concurrent writes without synchronization between the writers?• How to save space/memory when storing hundreds of billions of data items in HBase?
Ta Time Series Data in HBase ke 1 Col don’t care umn Key 1234567890 1 values 1234567892 2timestamps 1234567894 3Simplest design: only 1 time series, 1 row with asingle KeyValue per data point.Supports time-range scans.
Ta Time Series Data in HBase ke 2 Colu mn Key foo 1234567890 1 foo 1234567892 3 metric name fool 1234567890 2Metric name ﬁrst in row key for data locality.Problem: can’t store the metric as text in row keydue to space concerns
Ta Time Series Data in HBase ke 3 Colu Separate mn Key Lookup Table: Key Value 0x1 1234567890 1 0x1 foo 0x1 1234567892 3 0x2 fool metric foo 0x1 ID 0x2 1234567890 2 fool 0x2Use a separate table to assign unique IDs tometric names (and tags, not shown here). IDs give us apredictable length and achieve desired data locality.
Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2Reduce the number of rows by storing multipleconsecutive data points in the same row.Fewer rows = faster to seek to a speciﬁc row.
Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Misleading table 0x1 1234567892 3representation 0x2 1234567890 2 Gotcha #1: wider rows don’t save any space* Key Colum Value le 0x1 1234567890 n +0 1 ab l t d 0x1 1234567890 ua re +2 3 * Until magic preﬁx ct to +0 2 compression happens in A s 0x2 1234567890 upcoming HBase 0.94
Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2Devil is in the details: when to start new rows?Naive answer: start on ﬁrst data point, after sometime start a new row.
Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1000000000 1 0000 00 1 TSD1 1000 First data point: foo Start a new rowClient TSD2
Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... Key 0x1 1000000000 1 2 ... 0000 10 2 TSD1 1000 Keep adding foo points until...Client TSD2
Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 42 0000 0599 TSD1 ... some arbitrary fo o 10 limit, say 10minClient TSD2
Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 0x1 1000000600 51 51 0000 0610 TSD1 Then start a new fo o 10 rowClient TSD2
Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1234567890 1 But this scheme fails with multiple TSDs 5678 90 1 TSD1 Create new row foo 1234Client TSD2
Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 5678 92 3 TSD1 Add to row foo 1234Client TSD2
Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Oops! 0x1 1234567892 3 Maybe a connection failure occurred, client is retransmitting data to another TSD TSD1 Add to row foo 12345678 92 3Client TSD2 Create new row
Ta Time Series Data in HBase ke 5 Colu mn +90 +92 Key Basetimestamp 0x1 1234567800 1 3 always amultiple of 0x2 1234567800 2 600In order to scale easily and keep TSD stateless,make writes independent & idempotent.New rule: rows are aligned on 10 min. boundaries
Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key Basetimestamp 0x1 1234566000 1 3 always amultiple of 0x2 1234566000 2 36001 data point every ~10s => 60 data points / rowNot much. Go to wider rows to further increaseseek speed. One hour rows = 6x fewer rows
Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key 0x1 1234566000 1 3 0x2 1234566000 2Remember: wider rows don’t save any space! Key Colum Value Key is easily 4x le 0x1 1234566000 n +1890 1 bigger than tab column + value al ed 0x1 1234566000 tu or +1892 3Ac st 0x2 1234566000 +1890 2 and repeated
Ta Time Series Data in HBase ke 7 Colu mn +1890 +1890 +1892 +1892 Key 0x1 1234566000 1 1 3 3 0x2 1234566000 2Solution: “compact” columns by concatenation Key Column Value Space savings le 0x1 1234566000 +1890 1 on disk and in tab al ed 0x1 tu or 1234566000 +1890,+1892 1, 3 memory areAc st 0x1 1234566000 +1892 3 huge: data is 0x2 1234566000 +1890 2 4x-8x smaller!
¿ Questions ? ub tH Gi on opentsdb.net e m krFo Summary • Use asynchbase • Use Netty or Finagle • Wider table > Taller table • Short family names • Make writes idempotent • Make writes independent • Compact your data • Have predictable key sizes ool? Thin k this is c Benoît “tsuna” Sigoure W e’re hiring email@example.com