C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Time is Money
Financial Time Series
Jake Luciani and Carl Yeksigian
BlueMountain Capital

About this talk
Part 1: Our use case and architecture
Part 2: Our deployment and tuning
Part 3: Q&A

Know your problem.
1000s of consumers
..creating and reading data as fast as possible
..consistent to all readers
..and handle ad-hoc user queries
..quickly
..across data centers.

Know your data.
AAPL price
MSFT price

Know your queries.
Time Series Query
Start, End, Periodicity defines
query
1 minute periods

Know your queries.
Cross Section Query
As Of time defines the query
As Of Time (11am)

Know your queries.
Cross sections are random
Storing for all possible Cross Sections is not possible.
We also support bi-temporality

Let's optimize for Time Series.

CREATE TABLE tsdata (
id blob,
property string,
asof_ticks bigint,
knowledge_ticks bigint,
value blob,
PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)
)
WITH COMPACT STORAGE
AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks
DESC)
Data Model (CQL 3)

SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks >= 1234567890
AND asof_ticks <= 2345678901
CQL3 Queries: Time Series

CQL3 Queries: Cross Section
SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks = 1234567890
AND knowledge_ticks < 2345678901
LIMIT 1

A Service, not an app
C*
Olympus
Olympus
Olympus
Olympus
App
App
App
App
App
App
App
App
App
App
Fat Client
Olympus Thrift Service Olympus Thrift Service

Complex Value Types
Not every value is a double
Some values belong together (Bid and Ask should always come back
together)
Thrift structures as values
Typed, extensible schema
Union types give us a way to deserialize any type

But that's the easy part...
(queue transition)

Scaling...
The first rule of scaling is you do not just turn everything to 11.

Scaling...
Step 1 - Fast Machines for your workload
Step 2 - Avoid Java GC for your workload
Step 3 - Tune Cassandra for your workload
Step 4 - Prefetch and cache for your workload

Can't fix what you can't measure
Riemann (http://riemann.io)
Easily push application and system metrics into a single system
We push 6k metrics per second to a single Riemann instance

Metrics: Riemann
Yammer Metrics with Riemann
https://gist.github.com/carlyeks/5199090

Metrics: Riemann
Push stream based metrics library
Riemann Dash for Why is it Slow?
Graphite for
Why was it Slow?

VisualVM: The greatest tool EVER
Many useful plugins...
Just start jstatd on each server and go!

Scaling Reads: Machines
SSDs for hot data
JBOD config
As many cores as possible (> 16)
10GbE network
Bonded network cards
Jumbo frames

JBOD is a lifesaver
SSDs are great until they aren't anymore
JBOD allowed passive recovery in the face of simultaneous disk
failures (SSDs had a bad firmware)

Scaling Reads: Cassandra
Changes we've made:
• Configuration
• Compaction
• Compression

Leveled Compaction
Wide rows means data can be spread across a huge number of
SSTables
Leveled Compaction puts a bound on the worst case (*)
Fewer SSTables to read means lower latency, as shown below; orange
SSTables get read
L0
L1
L2
L3
L4
L5
* In Theory

Leveled Compaction: Breaking Bad
Under high write load, forced to read all of the L0 files
L0
L1
L2
L3
L4
L5

Hybrid Compaction: Breaking Better
Size Tiering Level 0
On by default in 2.0
L0
L1
L2
L3
L4
L5
{Hybrid
Compaction
Size Tiered
Leveled

Overlapping Compaction
Instead of forcing a combination of L0 files with L1, we can just push up
files
This allows a higher level of concurrency in compactions
We still know the SSTables that might contain the keys
We can force a proper compaction at any configurable level
L0
L1
L2
L3
L4
L5

C optimized library
Read path needs to be fast for our workload
CRC check, composite comparison eat a lot of cycles
CRC is implemented on chip for some architectures (why not use it?)
We want to move some of the operations into a JNI library to reduce
latency and improve throughput

Current Stats
16 nodes
2 Data Centers
Replication Factor 6
200k Writes/sec at EACH_QUORUM
150k Reads/sec at LOCAL_QUORUM
> 30 Million time series
> 15 Billion points
10 TB on disk (compressed)
Read Latency 50%/95% is 1ms/5ms

Questions?
Thank you!
@tjake and @carlyeks

C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Similar to C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian (20)

More from DataStax Academy

More from DataStax Academy (20)

Recently uploaded

Recently uploaded (20)

C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian