Financial Time Series       Cassandra 1.2Jake Luciani and Carl Yeksigian    BlueMountain Capital
Know your problem.1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle...
Know your data.AAPL priceMSFT price
Know your queries.Time Series Query                                          en      st        ar                         ...
Know your queries.Cross Section Query                               As Of Time (11am)As Of time defines the query
Know your queries.● Cross sections are for random data● Storing for Cross Sections means thousands of  writes, inconsisten...
Know your users.A million, billion writes per second..and reads are fast and happen at the same time..and we can answer ev...
Lets optimize for Time Series.  Since we cant optimize for everything.
Data Model (in C* 1.1) AAPL   lastPrice:2013-03-18:2013-03-19   0E-34-88-FF-26-E3-2C        lastPrice:2013-03-19:2012-03-1...
But were using C* 1.2.        CQL3              Parallel Compaction       V-nodes           Off-Heap Bloom Filters        ...
Data Model (CQL 3)CREATE TABLE tsdata (    id blob,    property string,    asof_ticks bigint,    knowledge_ticks bigint,  ...
CQL3 Queries: Time SeriesSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND aso...
CQL3 Queries: Cross SectionSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks = 1234567890AND kn...
Data Overload!All points between start and endEven though we have a periodicityAll knowledge timesEven though we only want...
A Service, not an app App                              Olympus                       App                                  ...
FiltrationFilter everything by knowledge timeFilter time series by periodicity200k points filtered down to 300AAPL:lastPri...
Pushdown Filters● To provide periodicity on raw data, downsample  on write● There are still cases where we dont know how  ...
Complex Value TypesNot every value is a doubleSome values belong togetherBid and Ask should come back together
ThriftThrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type
Thrift: Union Types                      https://gist.github.com/carlyeks/5199559
But thats the easypart...
Scaling...The first rule of scaling is you do not just turneveything to 11.
Scaling...Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your...
Cant fix what you cant measureRiemann (http://riemann.io)Easily push application and system metrics into a single systemWe...
Metrics: RiemannYammer Metrics with Riemann                              https://gist.github.com/carlyeks/5199090
Metrics: RiemannPush stream based metrics libraryRiemann Dash for Why is it Slow?Graphite forWhy was itSlow?
VisualVM-The greatest tool EVERMany useful plugins...Just start jstatd on each server and go!
Scaling Reads: MachinesSSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo ...
JBOD is a lifesaverSSDs are great until they arent anymoreJBOD allowed passive recovery in the face ofsimultaneous disk fa...
Scaling Reads: JVM                                            M-Xmx12G                                   JV gic!          ...
Scaling Reads: CassandraChanges weve made:● Configuration● Compaction● Compression● Pushdown Filters
Scaling Cassandra:ConfigurationHinted HandoffHHO single threaded, 100kb throttle
Scaling Cassandra:Configurationmemtable size2048mb, instead of 1/3 heapWere using a 12gb heap; leaves enough room for memt...
Scaling Cassandra:ConfigurationHalf-Sync Half-Async serverNo thread dedicated to an idle connectionWe have a lot of idle c...
Scaling Cassandra:ConfigurationMultithreaded compaction, 4 coresMore threads to compact means fastToo many threads means r...
Scaling Cassandra:ConfigurationDisabled internode compressionCaused too much GC and LatencyOn a 10GbE network, who needs c...
Leveled CompactionWide rows means data can be spread across ahuge number of SSTablesLeveled Compaction puts a bound on the...
Leveled CompactionBreaking BadUnder high write load, forced to read all of the L0files                                    ...
Hybrid Compaction  Breaking Better  Size Tiering Level 0                   Size Tiered   HybridCompaction             {   ...
Better Compression:New LZ4CompressorLZ4 Compression is 40% faster than GooglesSnappy...                    LZ4 JNISnappy J...
CRC Check ChanceCRC check of each compressed block causesreads to be 2x SLOWER.Lowered crc_check_chance to 10% of reads.A ...
Current Stats●   12 nodes●   2 DataCenters●   RF=6●   150k Writes/sec at EACH_QUORUM●   100k Reads/sec at LOCAL_QUORUM●   ...
Questions?Thank you!@tjake and @carlyeks
Upcoming SlideShare
Loading in …5
×

NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2

3,859 views

Published on

This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,859
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
86
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2

  1. 1. Financial Time Series Cassandra 1.2Jake Luciani and Carl Yeksigian BlueMountain Capital
  2. 2. Know your problem.1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle ad-hoc user queries..quickly..across datacenters.
  3. 3. Know your data.AAPL priceMSFT price
  4. 4. Know your queries.Time Series Query en st ar d t( (2 10 pm am ) 1 minute periods )Start, End, Periodicity defines query
  5. 5. Know your queries.Cross Section Query As Of Time (11am)As Of time defines the query
  6. 6. Know your queries.● Cross sections are for random data● Storing for Cross Sections means thousands of writes, inconsistent queries● We also need bitemporality, but its hard, so lets ignore it in the query
  7. 7. Know your users.A million, billion writes per second..and reads are fast and happen at the same time..and we can answer everything consistently..and it scales to new use cases quickly..and its all done yesterday
  8. 8. Lets optimize for Time Series. Since we cant optimize for everything.
  9. 9. Data Model (in C* 1.1) AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C lastPrice:2013-03-19:2012-03-19 0E-34-88-FF-26-E3-3D lastPrice:2013-03-19:2013-03-20 0E-34-88-FF-26-E3-4E
  10. 10. But were using C* 1.2. CQL3 Parallel Compaction V-nodes Off-Heap Bloom Filters JBOD Metrics!Pooled Decompression Concurrent Schema Creation buffers SSD Aware
  11. 11. Data Model (CQL 3)CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks))WITH COMPACT STORAGEAND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
  12. 12. CQL3 Queries: Time SeriesSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND asof_ticks <= 2345678901
  13. 13. CQL3 Queries: Cross SectionSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks = 1234567890AND knowledge_ticks < 2345678901LIMIT 1
  14. 14. Data Overload!All points between start and endEven though we have a periodicityAll knowledge timesEven though we only want latest
  15. 15. A Service, not an app App Olympus App Ol ym s pu pu ym s Ol App App Olympus Olympus App C* App App App Ol s pu ym lym pu s O App Olympus App
  16. 16. FiltrationFilter everything by knowledge timeFilter time series by periodicity200k points filtered down to 300AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-18:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-19 Service AAPL:lastPrice:2013-03-19:2013-03-20 Cassandra ReadsAAPL:lastPrice:2013-03-19:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21 FilterAAPL:lastPrice:2013-03-20:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-21
  17. 17. Pushdown Filters● To provide periodicity on raw data, downsample on write● There are still cases where we dont know how to sample● This filtering should be pushed to C*● The coordinator node should apply a filter to the result set
  18. 18. Complex Value TypesNot every value is a doubleSome values belong togetherBid and Ask should come back together
  19. 19. ThriftThrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type
  20. 20. Thrift: Union Types https://gist.github.com/carlyeks/5199559
  21. 21. But thats the easypart...
  22. 22. Scaling...The first rule of scaling is you do not just turneveything to 11.
  23. 23. Scaling...Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your workloadStep 4 - Prefetch and cache for your workload
  24. 24. Cant fix what you cant measureRiemann (http://riemann.io)Easily push application and system metrics into a single systemWe push 4k metrics per second to a single Riemann instance
  25. 25. Metrics: RiemannYammer Metrics with Riemann https://gist.github.com/carlyeks/5199090
  26. 26. Metrics: RiemannPush stream based metrics libraryRiemann Dash for Why is it Slow?Graphite forWhy was itSlow?
  27. 27. VisualVM-The greatest tool EVERMany useful plugins...Just start jstatd on each server and go!
  28. 28. Scaling Reads: MachinesSSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo frames
  29. 29. JBOD is a lifesaverSSDs are great until they arent anymoreJBOD allowed passive recovery in the face ofsimultaneous disk failures (SSDs had a badfirmware)
  30. 30. Scaling Reads: JVM M-Xmx12G JV gic! Ma-Xmn1600M-XX:SurvivorRatio=16-XX:+UseCompressedOops-XX:+UseTLAB yields ~15% Boost!(Thread local allocators, good for SEDAarchitectures)
  31. 31. Scaling Reads: CassandraChanges weve made:● Configuration● Compaction● Compression● Pushdown Filters
  32. 32. Scaling Cassandra:ConfigurationHinted HandoffHHO single threaded, 100kb throttle
  33. 33. Scaling Cassandra:Configurationmemtable size2048mb, instead of 1/3 heapWere using a 12gb heap; leaves enough room for memtableswhile the majority is left for reads and compaction.
  34. 34. Scaling Cassandra:ConfigurationHalf-Sync Half-Async serverNo thread dedicated to an idle connectionWe have a lot of idle connections
  35. 35. Scaling Cassandra:ConfigurationMultithreaded compaction, 4 coresMore threads to compact means fastToo many threads means resource contention
  36. 36. Scaling Cassandra:ConfigurationDisabled internode compressionCaused too much GC and LatencyOn a 10GbE network, who needs compression?
  37. 37. Leveled CompactionWide rows means data can be spread across ahuge number of SSTablesLeveled Compaction puts a bound on the worstcase (*)Fewer SSTables to read means lower latency, asshown below; orange SSTables get read L0 * In Theory L1 L2 L3 L4 L5
  38. 38. Leveled CompactionBreaking BadUnder high write load, forced to read all of the L0files L0 L1 L2 L3 L4 L5
  39. 39. Hybrid Compaction Breaking Better Size Tiering Level 0 Size Tiered HybridCompaction { Leveled L0 L1 L2 L3 L4 L5
  40. 40. Better Compression:New LZ4CompressorLZ4 Compression is 40% faster than GooglesSnappy... LZ4 JNISnappy JNI LZ4 Sun Unsafe Blocks in Cassandra are so small we dont see the same in production but the 95% latency is improved and it works with Java 7
  41. 41. CRC Check ChanceCRC check of each compressed block causesreads to be 2x SLOWER.Lowered crc_check_chance to 10% of reads.A move to JNI would cause a 30x boost
  42. 42. Current Stats● 12 nodes● 2 DataCenters● RF=6● 150k Writes/sec at EACH_QUORUM● 100k Reads/sec at LOCAL_QUORUM● > 6 Billion points (without replication)● 2TB on disk (compressed)● Read Latency 50%/95% is 1ms/10ms
  43. 43. Questions?Thank you!@tjake and @carlyeks

×