Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Lessons Learned
 from OpenTSDB

Or why OpenTSDB is the way it is
 and how it changed iteratively to
correct some of the mi...
Key concepts

•   Data Points
    (time, value)

•   Metrics
    proc.loadavg.1m

•   Tags
    host=web42     pool=static
...
OpenTSDB @ StumbleUpon
• Main production monitoring system for ~2 years
• Storing hundreds of billions of data points
• Ad...
Do’s
• Wider rows to seek faster
  before: ~4KB/row, after: ~20KB
• Make writes idempotent and independent
  before: start...
Don’ts

• Use HTable / HTablePool in app servers
  asynchbase + Netty or Finagle = performance++
• Put variable-length fiel...
Use asynchbase
                                         HTable           asynchbase
                scan                  ...
How OpenTSDB
         came to be the
            way it is
Questions:
• How to store time series data efficiently in HBase?...
Ta
     Time Series Data in HBase                    ke
                                                       1

        ...
Ta
     Time Series Data in HBase                  ke
                                                     2

            ...
Ta
     Time Series Data in HBase                      ke
                                                         3

    ...
Ta
    Time Series Data in HBase                ke
                                                  4

                  ...
Ta
       Time Series Data in HBase                             ke
                                                       ...
Ta
     Time Series Data in HBase                  ke
                                                     4

            ...
Ta
 Time Series Data in HBase                          ke
                                                         4

    ...
Ta
 Time Series Data in HBase                             ke
                                                            4...
Ta
 Time Series Data in HBase                            ke
                                                           4

...
Ta
 Time Series Data in HBase                           ke
                                                          4

  ...
Ta
 Time Series Data in HBase                        ke
                                                       4

        ...
Ta
 Time Series Data in HBase                       ke
                                                      4

          ...
Ta
 Time Series Data in HBase                          ke
                                                         4

    ...
Ta
      Time Series Data in HBase               ke
                                                   5

                ...
Ta
      Time Series Data in HBase               ke
                                                   6

                ...
Ta
    Time Series Data in HBase                          ke
                                                            6...
Ta
      Time Series Data in HBase                                ke
                                                     ...
¿ Questions ?
            ub
            tH
        Gi
       on

                   opentsdb.net
    e
   m
  kr
Fo




 ...
Upcoming SlideShare
Loading in …5
×

of

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 1 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 2 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 3 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 4 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 5 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 6 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 7 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 8 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 9 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 10 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 11 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 12 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 13 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 14 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 15 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 16 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 17 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 18 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 19 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 20 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 21 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 22 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 23 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 24 HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon Slide 25
Upcoming SlideShare
Monitoring MySQL with OpenTSDB
Next
Download to read offline and view in fullscreen.

51 Likes

Share

Download to read offline

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

Download to read offline

OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

  1. 1. Lessons Learned from OpenTSDB Or why OpenTSDB is the way it is and how it changed iteratively to correct some of the mistakes made Benoît “tsuna” Sigoure tsuna@stumbleupon.com
  2. 2. Key concepts • Data Points (time, value) • Metrics proc.loadavg.1m • Tags host=web42 pool=static • Metric + Tags = Time Series • Order of magnitude: >106 time series, >1012 data points put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static
  3. 3. OpenTSDB @ StumbleUpon • Main production monitoring system for ~2 years • Storing hundreds of billions of data points • Adding over 1 billion data points per day • 13000 data points/s → 130 QPS on HBase • If you had a 5 node cluster, this load would hardly make it sweat
  4. 4. Do’s • Wider rows to seek faster before: ~4KB/row, after: ~20KB • Make writes idempotent and independent before: start rows at arbitrary points in time after: align rows on 10m (then 1h) boundaries • Store more data per KeyValue Remember you pay for the key along each value in a row, so large keys are really expensive
  5. 5. Don’ts • Use HTable / HTablePool in app servers asynchbase + Netty or Finagle = performance++ • Put variable-length fields in composite keys They’re hard to scan • Exceed a few hundred regions per RegionServer “Oversharding” introduces overhead and makes recovering from failures more expensive
  6. 6. Use asynchbase HTable asynchbase scan sequential read sequential write 50s 500s 200s 38s 375s 150s 25s 250s 100s 13s 125s 50s 0s 0s 0s 4 8 16 24 32 4 8 16 24 32 4 8 16 24 32 # Threads # Threads # Threads
  7. 7. How OpenTSDB came to be the way it is Questions: • How to store time series data efficiently in HBase? • How to enable concurrent writes without synchronization between the writers? • How to save space/memory when storing hundreds of billions of data items in HBase?
  8. 8. Ta Time Series Data in HBase ke 1 Col don’t care umn Key 1234567890 1 values 1234567892 2 timestamps 1234567894 3 Simplest design: only 1 time series, 1 row with a single KeyValue per data point. Supports time-range scans.
  9. 9. Ta Time Series Data in HBase ke 2 Colu mn Key foo 1234567890 1 foo 1234567892 3 metric name fool 1234567890 2 Metric name first in row key for data locality. Problem: can’t store the metric as text in row key due to space concerns
  10. 10. Ta Time Series Data in HBase ke 3 Colu Separate mn Key Lookup Table: Key Value 0x1 1234567890 1 0x1 foo 0x1 1234567892 3 0x2 fool metric foo 0x1 ID 0x2 1234567890 2 fool 0x2 Use a separate table to assign unique IDs to metric names (and tags, not shown here). IDs give us a predictable length and achieve desired data locality.
  11. 11. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2 Reduce the number of rows by storing multiple consecutive data points in the same row. Fewer rows = faster to seek to a specific row.
  12. 12. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Misleading table 0x1 1234567892 3 representation 0x2 1234567890 2 Gotcha #1: wider rows don’t save any space* Key Colum Value le 0x1 1234567890 n +0 1 ab l t d 0x1 1234567890 ua re +2 3 * Until magic prefix ct to +0 2 compression happens in A s 0x2 1234567890 upcoming HBase 0.94
  13. 13. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2 Devil is in the details: when to start new rows? Naive answer: start on first data point, after some time start a new row.
  14. 14. Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1000000000 1 0000 00 1 TSD1 1000 First data point: foo Start a new row Client TSD2
  15. 15. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... Key 0x1 1000000000 1 2 ... 0000 10 2 TSD1 1000 Keep adding foo points until... Client TSD2
  16. 16. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 42 0000 0599 TSD1 ... some arbitrary fo o 10 limit, say 10min Client TSD2
  17. 17. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 0x1 1000000600 51 51 0000 0610 TSD1 Then start a new fo o 10 row Client TSD2
  18. 18. Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1234567890 1 But this scheme fails with multiple TSDs 5678 90 1 TSD1 Create new row foo 1234 Client TSD2
  19. 19. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 5678 92 3 TSD1 Add to row foo 1234 Client TSD2
  20. 20. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Oops! 0x1 1234567892 3 Maybe a connection failure occurred, client is retransmitting data to another TSD TSD1 Add to row foo 12345678 92 3 Client TSD2 Create new row
  21. 21. Ta Time Series Data in HBase ke 5 Colu mn +90 +92 Key Base timestamp 0x1 1234567800 1 3 always a multiple of 0x2 1234567800 2 600 In order to scale easily and keep TSD stateless, make writes independent & idempotent. New rule: rows are aligned on 10 min. boundaries
  22. 22. Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key Base timestamp 0x1 1234566000 1 3 always a multiple of 0x2 1234566000 2 3600 1 data point every ~10s => 60 data points / row Not much. Go to wider rows to further increase seek speed. One hour rows = 6x fewer rows
  23. 23. Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key 0x1 1234566000 1 3 0x2 1234566000 2 Remember: wider rows don’t save any space! Key Colum Value Key is easily 4x le 0x1 1234566000 n +1890 1 bigger than tab column + value al ed 0x1 1234566000 tu or +1892 3 Ac st 0x2 1234566000 +1890 2 and repeated
  24. 24. Ta Time Series Data in HBase ke 7 Colu mn +1890 +1890 +1892 +1892 Key 0x1 1234566000 1 1 3 3 0x2 1234566000 2 Solution: “compact” columns by concatenation Key Column Value Space savings le 0x1 1234566000 +1890 1 on disk and in tab al ed 0x1 tu or 1234566000 +1890,+1892 1, 3 memory are Ac st 0x1 1234566000 +1892 3 huge: data is 0x2 1234566000 +1890 2 4x-8x smaller!
  25. 25. ¿ Questions ? ub tH Gi on opentsdb.net e m kr Fo Summary • Use asynchbase • Use Netty or Finagle • Wider table > Taller table • Short family names • Make writes idempotent • Make writes independent • Compact your data • Have predictable key sizes ool? Thin k this is c Benoît “tsuna” Sigoure W e’re hiring tsuna@stumbleupon.com
  • mailenra1

    Feb. 18, 2019
  • antyRao

    Jun. 1, 2017
  • jackgao946

    Mar. 20, 2017
  • abhishekcreate

    Jan. 19, 2017
  • thinkos

    Jun. 10, 2016
  • hczcolin

    Apr. 17, 2016
  • BjrnAndresen4

    Apr. 8, 2016
  • mildronize

    Mar. 3, 2016
  • julianhessels

    Nov. 19, 2015
  • zhxgigi

    Aug. 28, 2015
  • wangweiqi

    Aug. 8, 2015
  • nyapoo

    Jun. 2, 2015
  • leokmax

    May. 26, 2015
  • thomasparry75

    Apr. 25, 2015
  • hiroshiumeda73

    Mar. 15, 2015
  • siseulemen

    Mar. 11, 2015
  • erwinka

    Feb. 14, 2015
  • seohoseok14

    Nov. 21, 2014
  • BobZhao

    Jul. 10, 2014
  • philippeback

    Feb. 21, 2014

OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.

Views

Total views

16,717

On Slideshare

0

From embeds

0

Number of embeds

2,243

Actions

Downloads

396

Shares

0

Comments

0

Likes

51

×