This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.
2. About this talk
Part 1: Our use case and architecture
Part 2: Our deployment and tuning
Part 3: Q&A
3. Know your problem.
1000s of consumers
..creating and reading data as fast as possible
..consistent to all readers
..and handle ad-hoc user queries
..quickly
..across data centers.
9. CREATE TABLE tsdata (
id blob,
property string,
asof_ticks bigint,
knowledge_ticks bigint,
value blob,
PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)
)
WITH COMPACT STORAGE
AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks
DESC)
Data Model (CQL 3)
10. SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks >= 1234567890
AND asof_ticks <= 2345678901
CQL3 Queries: Time Series
11. CQL3 Queries: Cross Section
SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks = 1234567890
AND knowledge_ticks < 2345678901
LIMIT 1
12. A Service, not an app
C*
Olympus
Olympus
Olympus
Olympus
App
App
App
App
App
App
App
App
App
App
Fat Client
Olympus Thrift Service Olympus Thrift Service
13. Complex Value Types
Not every value is a double
Some values belong together (Bid and Ask should always come back
together)
Thrift structures as values
Typed, extensible schema
Union types give us a way to deserialize any type
17. Scaling...
Step 1 - Fast Machines for your workload
Step 2 - Avoid Java GC for your workload
Step 3 - Tune Cassandra for your workload
Step 4 - Prefetch and cache for your workload
18. Can't fix what you can't measure
Riemann (http://riemann.io)
Easily push application and system metrics into a single system
We push 6k metrics per second to a single Riemann instance
20. Metrics: Riemann
Push stream based metrics library
Riemann Dash for Why is it Slow?
Graphite for
Why was it Slow?
21. VisualVM: The greatest tool EVER
Many useful plugins...
Just start jstatd on each server and go!
22. Scaling Reads: Machines
SSDs for hot data
JBOD config
As many cores as possible (> 16)
10GbE network
Bonded network cards
Jumbo frames
23. JBOD is a lifesaver
SSDs are great until they aren't anymore
JBOD allowed passive recovery in the face of simultaneous disk
failures (SSDs had a bad firmware)
25. Leveled Compaction
Wide rows means data can be spread across a huge number of
SSTables
Leveled Compaction puts a bound on the worst case (*)
Fewer SSTables to read means lower latency, as shown below; orange
SSTables get read
L0
L1
L2
L3
L4
L5
* In Theory
27. Hybrid Compaction: Breaking Better
Size Tiering Level 0
On by default in 2.0
L0
L1
L2
L3
L4
L5
{Hybrid
Compaction
Size Tiered
Leveled
28. Overlapping Compaction
Instead of forcing a combination of L0 files with L1, we can just push up
files
This allows a higher level of concurrency in compactions
We still know the SSTables that might contain the keys
We can force a proper compaction at any configurable level
L0
L1
L2
L3
L4
L5
29. C optimized library
Read path needs to be fast for our workload
CRC check, composite comparison eat a lot of cycles
CRC is implemented on chip for some architectures (why not use it?)
We want to move some of the operations into a JNI library to reduce
latency and improve throughput
30. Current Stats
16 nodes
2 Data Centers
Replication Factor 6
200k Writes/sec at EACH_QUORUM
150k Reads/sec at LOCAL_QUORUM
> 30 Million time series
> 15 Billion points
10 TB on disk (compressed)
Read Latency 50%/95% is 1ms/5ms