Time is Money
Financial Time Series
Jake Luciani and Carl Yeksigian
BlueMountain Capital
About this talk
Part 1: Our use case and architecture
Part 2: Our deployment and tuning
Part 3: Q&A
Know your problem.
1000s of consumers
..creating and reading data as fast as possible
..consistent to all readers
..and handle ad-hoc user queries
..quickly
..across data centers.
Know your data.
AAPL price
MSFT price
Know your queries.
Time Series Query
Start, End, Periodicity defines
query
1 minute periods
Know your queries.
Cross Section Query
As Of time defines the query
As Of Time (11am)
Know your queries.
Cross sections are random
Storing for all possible Cross Sections is not possible.
We also support bi-temporality
Let's optimize for Time Series.
CREATE TABLE tsdata (
id blob,
property string,
asof_ticks bigint,
knowledge_ticks bigint,
value blob,
PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)
)
WITH COMPACT STORAGE
AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks
DESC)
Data Model (CQL 3)
SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks >= 1234567890
AND asof_ticks <= 2345678901
CQL3 Queries: Time Series
CQL3 Queries: Cross Section
SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks = 1234567890
AND knowledge_ticks < 2345678901
LIMIT 1
A Service, not an app
C*
Olympus
Olympus
Olympus
Olympus
App
App
App
App
App
App
App
App
App
App
Fat Client
Olympus Thrift Service Olympus Thrift Service
Complex Value Types
Not every value is a double
Some values belong together (Bid and Ask should always come back
together)
Thrift structures as values
Typed, extensible schema
Union types give us a way to deserialize any type
Ad-hoc querying UI
But that's the easy part...
(queue transition)
Scaling...
The first rule of scaling is you do not just turn everything to 11.
Scaling...
Step 1 - Fast Machines for your workload
Step 2 - Avoid Java GC for your workload
Step 3 - Tune Cassandra for your workload
Step 4 - Prefetch and cache for your workload
Can't fix what you can't measure
Riemann (http://riemann.io)
Easily push application and system metrics into a single system
We push 6k metrics per second to a single Riemann instance
Metrics: Riemann
Yammer Metrics with Riemann
https://gist.github.com/carlyeks/5199090
Metrics: Riemann
Push stream based metrics library
Riemann Dash for Why is it Slow?
Graphite for
Why was it Slow?
VisualVM: The greatest tool EVER
Many useful plugins...
Just start jstatd on each server and go!
Scaling Reads: Machines
SSDs for hot data
JBOD config
As many cores as possible (> 16)
10GbE network
Bonded network cards
Jumbo frames
JBOD is a lifesaver
SSDs are great until they aren't anymore
JBOD allowed passive recovery in the face of simultaneous disk
failures (SSDs had a bad firmware)
Scaling Reads: Cassandra
Changes we've made:
• Configuration
• Compaction
• Compression
Leveled Compaction
Wide rows means data can be spread across a huge number of
SSTables
Leveled Compaction puts a bound on the worst case (*)
Fewer SSTables to read means lower latency, as shown below; orange
SSTables get read
L0
L1
L2
L3
L4
L5
* In Theory
Leveled Compaction: Breaking Bad
Under high write load, forced to read all of the L0 files
L0
L1
L2
L3
L4
L5
Hybrid Compaction: Breaking Better
Size Tiering Level 0
On by default in 2.0
L0
L1
L2
L3
L4
L5
{Hybrid
Compaction
Size Tiered
Leveled
Overlapping Compaction
Instead of forcing a combination of L0 files with L1, we can just push up
files
This allows a higher level of concurrency in compactions
We still know the SSTables that might contain the keys
We can force a proper compaction at any configurable level
L0
L1
L2
L3
L4
L5
C optimized library
Read path needs to be fast for our workload
CRC check, composite comparison eat a lot of cycles
CRC is implemented on chip for some architectures (why not use it?)
We want to move some of the operations into a JNI library to reduce
latency and improve throughput
Current Stats
16 nodes
2 Data Centers
Replication Factor 6
200k Writes/sec at EACH_QUORUM
150k Reads/sec at LOCAL_QUORUM
> 30 Million time series
> 15 Billion points
10 TB on disk (compressed)
Read Latency 50%/95% is 1ms/5ms
Questions?
Thank you!
@tjake and @carlyeks

C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

  • 1.
    Time is Money FinancialTime Series Jake Luciani and Carl Yeksigian BlueMountain Capital
  • 2.
    About this talk Part1: Our use case and architecture Part 2: Our deployment and tuning Part 3: Q&A
  • 3.
    Know your problem. 1000sof consumers ..creating and reading data as fast as possible ..consistent to all readers ..and handle ad-hoc user queries ..quickly ..across data centers.
  • 4.
    Know your data. AAPLprice MSFT price
  • 5.
    Know your queries. TimeSeries Query Start, End, Periodicity defines query 1 minute periods
  • 6.
    Know your queries. CrossSection Query As Of time defines the query As Of Time (11am)
  • 7.
    Know your queries. Crosssections are random Storing for all possible Cross Sections is not possible. We also support bi-temporality
  • 8.
    Let's optimize forTime Series.
  • 9.
    CREATE TABLE tsdata( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC) Data Model (CQL 3)
  • 10.
    SELECT * FROMtsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks >= 1234567890 AND asof_ticks <= 2345678901 CQL3 Queries: Time Series
  • 11.
    CQL3 Queries: CrossSection SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks = 1234567890 AND knowledge_ticks < 2345678901 LIMIT 1
  • 12.
    A Service, notan app C* Olympus Olympus Olympus Olympus App App App App App App App App App App Fat Client Olympus Thrift Service Olympus Thrift Service
  • 13.
    Complex Value Types Notevery value is a double Some values belong together (Bid and Ask should always come back together) Thrift structures as values Typed, extensible schema Union types give us a way to deserialize any type
  • 14.
  • 15.
    But that's theeasy part... (queue transition)
  • 16.
    Scaling... The first ruleof scaling is you do not just turn everything to 11.
  • 17.
    Scaling... Step 1 -Fast Machines for your workload Step 2 - Avoid Java GC for your workload Step 3 - Tune Cassandra for your workload Step 4 - Prefetch and cache for your workload
  • 18.
    Can't fix whatyou can't measure Riemann (http://riemann.io) Easily push application and system metrics into a single system We push 6k metrics per second to a single Riemann instance
  • 19.
    Metrics: Riemann Yammer Metricswith Riemann https://gist.github.com/carlyeks/5199090
  • 20.
    Metrics: Riemann Push streambased metrics library Riemann Dash for Why is it Slow? Graphite for Why was it Slow?
  • 21.
    VisualVM: The greatesttool EVER Many useful plugins... Just start jstatd on each server and go!
  • 22.
    Scaling Reads: Machines SSDsfor hot data JBOD config As many cores as possible (> 16) 10GbE network Bonded network cards Jumbo frames
  • 23.
    JBOD is alifesaver SSDs are great until they aren't anymore JBOD allowed passive recovery in the face of simultaneous disk failures (SSDs had a bad firmware)
  • 24.
    Scaling Reads: Cassandra Changeswe've made: • Configuration • Compaction • Compression
  • 25.
    Leveled Compaction Wide rowsmeans data can be spread across a huge number of SSTables Leveled Compaction puts a bound on the worst case (*) Fewer SSTables to read means lower latency, as shown below; orange SSTables get read L0 L1 L2 L3 L4 L5 * In Theory
  • 26.
    Leveled Compaction: BreakingBad Under high write load, forced to read all of the L0 files L0 L1 L2 L3 L4 L5
  • 27.
    Hybrid Compaction: BreakingBetter Size Tiering Level 0 On by default in 2.0 L0 L1 L2 L3 L4 L5 {Hybrid Compaction Size Tiered Leveled
  • 28.
    Overlapping Compaction Instead offorcing a combination of L0 files with L1, we can just push up files This allows a higher level of concurrency in compactions We still know the SSTables that might contain the keys We can force a proper compaction at any configurable level L0 L1 L2 L3 L4 L5
  • 29.
    C optimized library Readpath needs to be fast for our workload CRC check, composite comparison eat a lot of cycles CRC is implemented on chip for some architectures (why not use it?) We want to move some of the operations into a JNI library to reduce latency and improve throughput
  • 30.
    Current Stats 16 nodes 2Data Centers Replication Factor 6 200k Writes/sec at EACH_QUORUM 150k Reads/sec at LOCAL_QUORUM > 30 Million time series > 15 Billion points 10 TB on disk (compressed) Read Latency 50%/95% is 1ms/5ms
  • 31.