NYC* Big Tech Day 2013: Financial Time Series


Published on

A talk about how BlueMountain Capital utilizes Cassandra to store Financial data, with @tjake and @carlyeks.

Published in: Technology
    Are you sure you want to  Yes  No
    Your message goes here

NYC* Big Tech Day 2013: Financial Time Series

  1. Financial Time Series Cassandra 1.2Jake Luciani and Carl Yeksigian BlueMountain Capital
  2. Know your problem.1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle ad-hoc user queries..quickly..across datacenters.
  3. Know your data.AAPL priceMSFT price
  4. Know your queries.Time Series Query en st ar d t( (2 10 pm am ) 1 minute periods )Start, End, Periodicity defines query
  5. Know your queries.Cross Section Query As Of Time (11am)As Of time defines the query
  6. Know your queries.● Cross sections are for random data● Storing for Cross Sections means thousands of writes, inconsistent queries● We also need bitemporality, but its hard, so lets ignore it in the query
  7. Know your users.A million, billion writes per second..and reads are fast and happen at the same time..and we can answer everything consistently..and it scales to new use cases quickly..and its all done yesterday
  8. Lets optimize for Time Series. Since we cant optimize for everything.
  9. Data Model (in C* 1.1) AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C lastPrice:2013-03-19:2012-03-19 0E-34-88-FF-26-E3-3D lastPrice:2013-03-19:2013-03-20 0E-34-88-FF-26-E3-4E
  10. But were using C* 1.2. CQL3 Parallel Compaction V-nodes Off-Heap Bloom Filters JBOD Metrics!Pooled Decompression Concurrent Schema Creation buffers SSD Aware
  11. Data Model (CQL 3)CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks))WITH COMPACT STORAGEAND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
  12. CQL3 Queries: Time SeriesSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND asof_ticks <= 2345678901
  13. CQL3 Queries: Cross SectionSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks = 1234567890AND knowledge_ticks < 2345678901LIMIT 1
  14. Data Overload!All points between start and endEven though we have a periodicityAll knowledge timesEven though we only want latest
  15. A Service, not an app App Olympus App Ol ym s pu pu ym s Ol App App Olympus Olympus App C* App App App Ol s pu ym lym pu s O App Olympus App
  16. FiltrationFilter everything by knowledge timeFilter time series by periodicity200k points filtered down to 300AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-18:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-19 Service AAPL:lastPrice:2013-03-19:2013-03-20 Cassandra ReadsAAPL:lastPrice:2013-03-19:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21 FilterAAPL:lastPrice:2013-03-20:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-21
  17. Pushdown Filters● To provide periodicity on raw data, downsample on write● There are still cases where we dont know how to sample● This filtering should be pushed to C*● The coordinator node should apply a filter to the result set
  18. Complex Value TypesNot every value is a doubleSome values belong togetherBid and Ask should come back together
  19. ThriftThrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type
  20. Thrift: Union Types
  21. But thats the easypart...
  22. Scaling...The first rule of scaling is you do not just turneveything to 11.
  23. Scaling...Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your workloadStep 4 - Prefetch and cache for your workload
  24. Cant fix what you cant measureRiemann ( push application and system metrics into a single systemWe push 4k metrics per second to a single Riemann instance
  25. Metrics: RiemannYammer Metrics with Riemann
  26. Metrics: RiemannPush stream based metrics libraryRiemann Dash for Why is it Slow?Graphite forWhy was itSlow?
  27. VisualVM-The greatest tool EVERMany useful plugins...Just start jstatd on each server and go!
  28. Scaling Reads: MachinesSSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo frames
  29. JBOD is a lifesaverSSDs are great until they arent anymoreJBOD allowed passive recovery in the face ofsimultaneous disk failures (SSDs had a badfirmware)
  30. Scaling Reads: JVM M-Xmx12G JV gic! Ma-Xmn1600M-XX:SurvivorRatio=16-XX:+UseCompressedOops-XX:+UseTLAB yields ~15% Boost!(Thread local allocators, good for SEDAarchitectures)
  31. Scaling Reads: CassandraChanges weve made:● Configuration● Compaction● Compression● Pushdown Filters
  32. Scaling Cassandra:ConfigurationHinted HandoffHHO single threaded, 100kb throttle
  33. Scaling Cassandra:Configurationmemtable size2048mb, instead of 1/3 heapWere using a 12gb heap; leaves enough room for memtableswhile the majority is left for reads and compaction.
  34. Scaling Cassandra:ConfigurationHalf-Sync Half-Async serverNo thread dedicated to an idle connectionWe have a lot of idle connections
  35. Scaling Cassandra:ConfigurationMultithreaded compaction, 4 coresMore threads to compact means fastToo many threads means resource contention
  36. Scaling Cassandra:ConfigurationDisabled internode compressionCaused too much GC and LatencyOn a 10GbE network, who needs compression?
  37. Leveled CompactionWide rows means data can be spread across ahuge number of SSTablesLeveled Compaction puts a bound on the worstcase (*)Fewer SSTables to read means lower latency, asshown below; orange SSTables get read L0 * In Theory L1 L2 L3 L4 L5
  38. Leveled CompactionBreaking BadUnder high write load, forced to read all of the L0files L0 L1 L2 L3 L4 L5
  39. Hybrid Compaction Breaking Better Size Tiering Level 0 Size Tiered HybridCompaction { Leveled L0 L1 L2 L3 L4 L5
  40. Better Compression:New LZ4CompressorLZ4 Compression is 40% faster than GooglesSnappy... LZ4 JNISnappy JNI LZ4 Sun Unsafe Blocks in Cassandra are so small we dont see the same in production but the 95% latency is improved and it works with Java 7
  41. CRC Check ChanceCRC check of each compressed block causesreads to be 2x SLOWER.Lowered crc_check_chance to 10% of reads.A move to JNI would cause a 30x boost
  42. Current Stats● 12 nodes● 2 DataCenters● RF=6● 150k Writes/sec at EACH_QUORUM● 100k Reads/sec at LOCAL_QUORUM● > 6 Billion points (without replication)● 2TB on disk (compressed)● Read Latency 50%/95% is 1ms/10ms
  43. Questions?Thank you!@tjake and @carlyeks