© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
© 2014 MapR Technologies 2 
Agenda 
• What is a time series? 
• Where does it come from? 
• What do we need to do with it? 
– theoretically 
– practically 
• How can we do that? 
– basics of time series processing 
– advanced time series database
© 2014 MapR Technologies 3 
What is a Time Series? 
• Stuff with timestamps 
– sensor measurements 
– system stats 
– log files 
– configuration files 
Yes. Really. 
• Well, several general categories 
– numerical time series (what most people think of) 
– events 
– non-numerical time series (the strange cases)
© 2014 MapR Technologies 4 
Got Time Series?
© 2014 MapR Technologies 5
© 2014 MapR Technologies 6
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
© 2014 MapR Technologies 10
© 2014 MapR Technologies 11
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14 
What Do We Do With Time Series 
• Acquire 
– Measurement, transmission, reception 
• Store 
– Individually, or grouped for some amount of time 
• Retrieve 
– Ad hoc, flexible, correlate and aggregate 
• Analyze and visualize 
– We facilitate this via retrieval
© 2014 MapR Technologies 15 
Acquisition 
Not usually our problem 
• Sensors 
• Data collection – agents, raspberry pi 
• Transmission – via LAN/Wan, Mobile Network, Satellites 
• Receipt into system – listening daemon or queue, or depending 
on use case writing directly to the database
© 2014 MapR Technologies 16 
Storage Choices 
• Flat files 
– Great for rapid ingest with massive data 
– Handles essentially any data type 
– Less good for data requiring frequent updates 
– Harder to find specific ranges 
• Traditional RDBMS 
– Ingests up to ~10,000/ sec; prefers well structured (numerical) data; expensive 
• NoSQL (such as MapR-DB or HBase) 
– Easily handle 10,000 rows / sec / node – True linear scaling 
– Handles wide variety of data 
– Good for frequent updates 
– Easily scanned in a range
© 2014 MapR Technologies 17 
Retrieval Requirements 
• Retrieve by time-series, time range, tags 
– Possibly pull millions of data points at a time 
– Possibly do on-the-fly windowed aggregations 
• Simple querying 
– start period, end period, metrics, tags 
– REST API for integration 
– CLI for testing 
• Graphs
© 2014 MapR Technologies 18 
Specific Example 
• Consider a server farm 
• Lots of system metrics 
• Typically 100-300 stats / 30 s 
• Loads, RPC’s, packets, requests/s 
• Common to have 100 – 10,000 machines
© 2014 MapR Technologies 19 
The General Outline 
10 samples / second / machine 
x 1,000 machines 
= 10,000 samples / second 
• This is what Open TSDB was designed to handle 
• Install and go, but don’t test at scale
© 2014 MapR Technologies 20 
Will it Scale?
© 2014 MapR Technologies 21 
Will it Scale?
© 2014 MapR Technologies 22 
Specific Example 
• Consider oil drilling rigs 
• When drilling wells, there are *lots* of moving parts 
• Typically a drilling rig makes about 10K samples/s 
• Temperatures, pressures, magnetics, 
machine vibration levels, salinity, voltage, 
currents, many others 
• Typical project has 100 rigs
© 2014 MapR Technologies 23 
The General Outline 
10K samples / second / rig 
x 100 rigs 
= 1M samples / second 
• But wait, there’s more 
– Suppose you want to test your system 
– Perhaps with a year of data 
– And you want to load that data in << 1 year 
• 100x real-time = 100M samples / second
How does that Work (Open TSDB on MapR)? 
© 2014 MapR Technologies 24 
Samples 
Message 
MapR 
Collector 
queue 
table Web service Users
© 2014 MapR Technologies 25 
Data Storage 
Key 13 43 73 103 … 
… 
series-uid.time-window 4.5 5.2 6.1 4.9 
… 
• Typical time window is one hour 
• Column names are offsets in time window 
• Find series-uid in separate table
Key 13 43 73 103 blob 
… 
series-uid.time-window 4.5 5.2 6.1 4.9 {t:[13,43,73,103], 
v=[4.5,5.2,6.1,4.9]} 
© 2014 MapR Technologies 26 
Eventual Compaction 
… 
• Insertion of data as blob makes original columns redundant 
• This is the way that TSD should work, not quite how it does work
Key blob 
… 
series-uid.time-window {t:[13,43,73,103], 
© 2014 MapR Technologies 27 
Eventual Compaction 
• Converting old data to blobs allows compact storage, faster 
retrieval 
v=[4.5,5.2,6.1,4.9]} 
…
© 2014 MapR Technologies 28 
Single Point Loading 
• Each sample requires one insertion, compaction requires 
another 
• Typical performance on a cluster 
– 1 edge node + 4 cluster nodes 
– Up to 20k samples per second observed 
• Suitable for server monitoring 
• Not suitable for large scale history ingestion 
• 1000x too slow for industrial work
Web service queries 
database and 
collector 
© 2014 MapR Technologies 29 
Small Trick … Buffer Data in Memory 
Message 
queue Samples 
Users 
Collector 
MapR 
table 
Web service 
Log 
Buffering data for 1 hour in 
collector allows >1000x 
performance gain 
Logging latest hour of data allows 
clean restart of collector 
(lambda + epsilon architecture)
© 2014 MapR Technologies 30 
Batch Loading 
• 3600 samples require one insertion 
– No compactions necessary 
• Typical performance on SE cluster 
– 1 edge node + 4 cluster nodes 
– Up to 30 million samples per second per node observed 
– ~700x faster ingestion 
• Suitable for large scale history ingestion 
• 30 million data points retrieved in 20s (in JSON format) 
• Ready for industrial work
© 2014 MapR Technologies 31 
When is this All Wrong? 
• In some cases, retrieval by series-id + time range not sufficient 
• Log files 
– May need very flexible retrieval of events based on text-like criteria 
• Search may be better than time-series database 
– Can scale Lucene based search to > 1 million events / second 
• Geo-temporal storage access patterns
© 2014 MapR Technologies 32 
Q & A 
Engage with us! 
@kingmesal maprtech 
jsccot@mapr.com 
MapR 
maprtech 
mapr-technologies

Time Series Data in a Time Series World

  • 1.
    © 2014 MapRTechno©lo 2g0ie1s4 MapR Technologies 1
  • 2.
    © 2014 MapRTechnologies 2 Agenda • What is a time series? • Where does it come from? • What do we need to do with it? – theoretically – practically • How can we do that? – basics of time series processing – advanced time series database
  • 3.
    © 2014 MapRTechnologies 3 What is a Time Series? • Stuff with timestamps – sensor measurements – system stats – log files – configuration files Yes. Really. • Well, several general categories – numerical time series (what most people think of) – events – non-numerical time series (the strange cases)
  • 4.
    © 2014 MapRTechnologies 4 Got Time Series?
  • 5.
    © 2014 MapRTechnologies 5
  • 6.
    © 2014 MapRTechnologies 6
  • 7.
    © 2014 MapRTechnologies 7
  • 8.
    © 2014 MapRTechnologies 8
  • 9.
    © 2014 MapRTechnologies 9
  • 10.
    © 2014 MapRTechnologies 10
  • 11.
    © 2014 MapRTechnologies 11
  • 12.
    © 2014 MapRTechnologies 12
  • 13.
    © 2014 MapRTechnologies 13
  • 14.
    © 2014 MapRTechnologies 14 What Do We Do With Time Series • Acquire – Measurement, transmission, reception • Store – Individually, or grouped for some amount of time • Retrieve – Ad hoc, flexible, correlate and aggregate • Analyze and visualize – We facilitate this via retrieval
  • 15.
    © 2014 MapRTechnologies 15 Acquisition Not usually our problem • Sensors • Data collection – agents, raspberry pi • Transmission – via LAN/Wan, Mobile Network, Satellites • Receipt into system – listening daemon or queue, or depending on use case writing directly to the database
  • 16.
    © 2014 MapRTechnologies 16 Storage Choices • Flat files – Great for rapid ingest with massive data – Handles essentially any data type – Less good for data requiring frequent updates – Harder to find specific ranges • Traditional RDBMS – Ingests up to ~10,000/ sec; prefers well structured (numerical) data; expensive • NoSQL (such as MapR-DB or HBase) – Easily handle 10,000 rows / sec / node – True linear scaling – Handles wide variety of data – Good for frequent updates – Easily scanned in a range
  • 17.
    © 2014 MapRTechnologies 17 Retrieval Requirements • Retrieve by time-series, time range, tags – Possibly pull millions of data points at a time – Possibly do on-the-fly windowed aggregations • Simple querying – start period, end period, metrics, tags – REST API for integration – CLI for testing • Graphs
  • 18.
    © 2014 MapRTechnologies 18 Specific Example • Consider a server farm • Lots of system metrics • Typically 100-300 stats / 30 s • Loads, RPC’s, packets, requests/s • Common to have 100 – 10,000 machines
  • 19.
    © 2014 MapRTechnologies 19 The General Outline 10 samples / second / machine x 1,000 machines = 10,000 samples / second • This is what Open TSDB was designed to handle • Install and go, but don’t test at scale
  • 20.
    © 2014 MapRTechnologies 20 Will it Scale?
  • 21.
    © 2014 MapRTechnologies 21 Will it Scale?
  • 22.
    © 2014 MapRTechnologies 22 Specific Example • Consider oil drilling rigs • When drilling wells, there are *lots* of moving parts • Typically a drilling rig makes about 10K samples/s • Temperatures, pressures, magnetics, machine vibration levels, salinity, voltage, currents, many others • Typical project has 100 rigs
  • 23.
    © 2014 MapRTechnologies 23 The General Outline 10K samples / second / rig x 100 rigs = 1M samples / second • But wait, there’s more – Suppose you want to test your system – Perhaps with a year of data – And you want to load that data in << 1 year • 100x real-time = 100M samples / second
  • 24.
    How does thatWork (Open TSDB on MapR)? © 2014 MapR Technologies 24 Samples Message MapR Collector queue table Web service Users
  • 25.
    © 2014 MapRTechnologies 25 Data Storage Key 13 43 73 103 … … series-uid.time-window 4.5 5.2 6.1 4.9 … • Typical time window is one hour • Column names are offsets in time window • Find series-uid in separate table
  • 26.
    Key 13 4373 103 blob … series-uid.time-window 4.5 5.2 6.1 4.9 {t:[13,43,73,103], v=[4.5,5.2,6.1,4.9]} © 2014 MapR Technologies 26 Eventual Compaction … • Insertion of data as blob makes original columns redundant • This is the way that TSD should work, not quite how it does work
  • 27.
    Key blob … series-uid.time-window {t:[13,43,73,103], © 2014 MapR Technologies 27 Eventual Compaction • Converting old data to blobs allows compact storage, faster retrieval v=[4.5,5.2,6.1,4.9]} …
  • 28.
    © 2014 MapRTechnologies 28 Single Point Loading • Each sample requires one insertion, compaction requires another • Typical performance on a cluster – 1 edge node + 4 cluster nodes – Up to 20k samples per second observed • Suitable for server monitoring • Not suitable for large scale history ingestion • 1000x too slow for industrial work
  • 29.
    Web service queries database and collector © 2014 MapR Technologies 29 Small Trick … Buffer Data in Memory Message queue Samples Users Collector MapR table Web service Log Buffering data for 1 hour in collector allows >1000x performance gain Logging latest hour of data allows clean restart of collector (lambda + epsilon architecture)
  • 30.
    © 2014 MapRTechnologies 30 Batch Loading • 3600 samples require one insertion – No compactions necessary • Typical performance on SE cluster – 1 edge node + 4 cluster nodes – Up to 30 million samples per second per node observed – ~700x faster ingestion • Suitable for large scale history ingestion • 30 million data points retrieved in 20s (in JSON format) • Ready for industrial work
  • 31.
    © 2014 MapRTechnologies 31 When is this All Wrong? • In some cases, retrieval by series-id + time range not sufficient • Log files – May need very flexible retrieval of events based on text-like criteria • Search may be better than time-series database – Can scale Lucene based search to > 1 million events / second • Geo-temporal storage access patterns
  • 32.
    © 2014 MapRTechnologies 32 Q & A Engage with us! @kingmesal maprtech jsccot@mapr.com MapR maprtech mapr-technologies