This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as
This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.
Know your queries.Time Series Query en st ar d t( (2 10 pm am ) 1 minute periods )Start, End, Periodicity defines query
Know your queries.Cross Section Query As Of Time (11am)As Of time defines the query
Know your queries.● Cross sections are for random data● Storing for Cross Sections means thousands of writes, inconsistent queries● We also need bitemporality, but its hard, so lets ignore it in the query
Know your users.A million, billion writes per second..and reads are fast and happen at the same time..and we can answer everything consistently..and it scales to new use cases quickly..and its all done yesterday
Lets optimize for Time Series. Since we cant optimize for everything.
Data Model (in C* 1.1) AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C lastPrice:2013-03-19:2012-03-19 0E-34-88-FF-26-E3-3D lastPrice:2013-03-19:2013-03-20 0E-34-88-FF-26-E3-4E
But were using C* 1.2. CQL3 Parallel Compaction V-nodes Off-Heap Bloom Filters JBOD Metrics!Pooled Decompression Concurrent Schema Creation buffers SSD Aware
Data Model (CQL 3)CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks))WITH COMPACT STORAGEAND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
CQL3 Queries: Time SeriesSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND asof_ticks <= 2345678901
Data Overload!All points between start and endEven though we have a periodicityAll knowledge timesEven though we only want latest
A Service, not an app App Olympus App Ol ym s pu pu ym s Ol App App Olympus Olympus App C* App App App Ol s pu ym lym pu s O App Olympus App
FiltrationFilter everything by knowledge timeFilter time series by periodicity200k points filtered down to 300AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-18:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-19 Service AAPL:lastPrice:2013-03-19:2013-03-20 Cassandra ReadsAAPL:lastPrice:2013-03-19:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21 FilterAAPL:lastPrice:2013-03-20:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-21
Pushdown Filters● To provide periodicity on raw data, downsample on write● There are still cases where we dont know how to sample● This filtering should be pushed to C*● The coordinator node should apply a filter to the result set
Complex Value TypesNot every value is a doubleSome values belong togetherBid and Ask should come back together
ThriftThrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type
Thrift: Union Types https://gist.github.com/carlyeks/5199559
Scaling Cassandra:ConfigurationHinted HandoffHHO single threaded, 100kb throttle
Scaling Cassandra:Configurationmemtable size2048mb, instead of 1/3 heapWere using a 12gb heap; leaves enough room for memtableswhile the majority is left for reads and compaction.
Scaling Cassandra:ConfigurationHalf-Sync Half-Async serverNo thread dedicated to an idle connectionWe have a lot of idle connections
Scaling Cassandra:ConfigurationMultithreaded compaction, 4 coresMore threads to compact means fastToo many threads means resource contention
Scaling Cassandra:ConfigurationDisabled internode compressionCaused too much GC and LatencyOn a 10GbE network, who needs compression?
Leveled CompactionWide rows means data can be spread across ahuge number of SSTablesLeveled Compaction puts a bound on the worstcase (*)Fewer SSTables to read means lower latency, asshown below; orange SSTables get read L0 * In Theory L1 L2 L3 L4 L5
Leveled CompactionBreaking BadUnder high write load, forced to read all of the L0files L0 L1 L2 L3 L4 L5
Better Compression:New LZ4CompressorLZ4 Compression is 40% faster than GooglesSnappy... LZ4 JNISnappy JNI LZ4 Sun Unsafe Blocks in Cassandra are so small we dont see the same in production but the 95% latency is improved and it works with Java 7
CRC Check ChanceCRC check of each compressed block causesreads to be 2x SLOWER.Lowered crc_check_chance to 10% of reads.A move to JNI would cause a 30x boost
Current Stats● 12 nodes● 2 DataCenters● RF=6● 150k Writes/sec at EACH_QUORUM● 100k Reads/sec at LOCAL_QUORUM● > 6 Billion points (without replication)● 2TB on disk (compressed)● Read Latency 50%/95% is 1ms/10ms