• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2
 

NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2

on

  • 2,520 views

This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as ...

This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

Statistics

Views

Total Views
2,520
Views on SlideShare
2,519
Embed Views
1

Actions

Likes
1
Downloads
57
Comments
0

1 Embed 1

http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2 NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2 Presentation Transcript

    • Financial Time Series Cassandra 1.2Jake Luciani and Carl Yeksigian BlueMountain Capital
    • Know your problem.1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle ad-hoc user queries..quickly..across datacenters.
    • Know your data.AAPL priceMSFT price
    • Know your queries.Time Series Query en st ar d t( (2 10 pm am ) 1 minute periods )Start, End, Periodicity defines query
    • Know your queries.Cross Section Query As Of Time (11am)As Of time defines the query
    • Know your queries.● Cross sections are for random data● Storing for Cross Sections means thousands of writes, inconsistent queries● We also need bitemporality, but its hard, so lets ignore it in the query
    • Know your users.A million, billion writes per second..and reads are fast and happen at the same time..and we can answer everything consistently..and it scales to new use cases quickly..and its all done yesterday
    • Lets optimize for Time Series. Since we cant optimize for everything.
    • Data Model (in C* 1.1) AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C lastPrice:2013-03-19:2012-03-19 0E-34-88-FF-26-E3-3D lastPrice:2013-03-19:2013-03-20 0E-34-88-FF-26-E3-4E
    • But were using C* 1.2. CQL3 Parallel Compaction V-nodes Off-Heap Bloom Filters JBOD Metrics!Pooled Decompression Concurrent Schema Creation buffers SSD Aware
    • Data Model (CQL 3)CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks))WITH COMPACT STORAGEAND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
    • CQL3 Queries: Time SeriesSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND asof_ticks <= 2345678901
    • CQL3 Queries: Cross SectionSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks = 1234567890AND knowledge_ticks < 2345678901LIMIT 1
    • Data Overload!All points between start and endEven though we have a periodicityAll knowledge timesEven though we only want latest
    • A Service, not an app App Olympus App Ol ym s pu pu ym s Ol App App Olympus Olympus App C* App App App Ol s pu ym lym pu s O App Olympus App
    • FiltrationFilter everything by knowledge timeFilter time series by periodicity200k points filtered down to 300AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-18:2013-03-19AAPL:lastPrice:2013-03-19:2013-03-19 Service AAPL:lastPrice:2013-03-19:2013-03-20 Cassandra ReadsAAPL:lastPrice:2013-03-19:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21 FilterAAPL:lastPrice:2013-03-20:2013-03-20AAPL:lastPrice:2013-03-20:2013-03-21
    • Pushdown Filters● To provide periodicity on raw data, downsample on write● There are still cases where we dont know how to sample● This filtering should be pushed to C*● The coordinator node should apply a filter to the result set
    • Complex Value TypesNot every value is a doubleSome values belong togetherBid and Ask should come back together
    • ThriftThrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type
    • Thrift: Union Types https://gist.github.com/carlyeks/5199559
    • But thats the easypart...
    • Scaling...The first rule of scaling is you do not just turneveything to 11.
    • Scaling...Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your workloadStep 4 - Prefetch and cache for your workload
    • Cant fix what you cant measureRiemann (http://riemann.io)Easily push application and system metrics into a single systemWe push 4k metrics per second to a single Riemann instance
    • Metrics: RiemannYammer Metrics with Riemann https://gist.github.com/carlyeks/5199090
    • Metrics: RiemannPush stream based metrics libraryRiemann Dash for Why is it Slow?Graphite forWhy was itSlow?
    • VisualVM-The greatest tool EVERMany useful plugins...Just start jstatd on each server and go!
    • Scaling Reads: MachinesSSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo frames
    • JBOD is a lifesaverSSDs are great until they arent anymoreJBOD allowed passive recovery in the face ofsimultaneous disk failures (SSDs had a badfirmware)
    • Scaling Reads: JVM M-Xmx12G JV gic! Ma-Xmn1600M-XX:SurvivorRatio=16-XX:+UseCompressedOops-XX:+UseTLAB yields ~15% Boost!(Thread local allocators, good for SEDAarchitectures)
    • Scaling Reads: CassandraChanges weve made:● Configuration● Compaction● Compression● Pushdown Filters
    • Scaling Cassandra:ConfigurationHinted HandoffHHO single threaded, 100kb throttle
    • Scaling Cassandra:Configurationmemtable size2048mb, instead of 1/3 heapWere using a 12gb heap; leaves enough room for memtableswhile the majority is left for reads and compaction.
    • Scaling Cassandra:ConfigurationHalf-Sync Half-Async serverNo thread dedicated to an idle connectionWe have a lot of idle connections
    • Scaling Cassandra:ConfigurationMultithreaded compaction, 4 coresMore threads to compact means fastToo many threads means resource contention
    • Scaling Cassandra:ConfigurationDisabled internode compressionCaused too much GC and LatencyOn a 10GbE network, who needs compression?
    • Leveled CompactionWide rows means data can be spread across ahuge number of SSTablesLeveled Compaction puts a bound on the worstcase (*)Fewer SSTables to read means lower latency, asshown below; orange SSTables get read L0 * In Theory L1 L2 L3 L4 L5
    • Leveled CompactionBreaking BadUnder high write load, forced to read all of the L0files L0 L1 L2 L3 L4 L5
    • Hybrid Compaction Breaking Better Size Tiering Level 0 Size Tiered HybridCompaction { Leveled L0 L1 L2 L3 L4 L5
    • Better Compression:New LZ4CompressorLZ4 Compression is 40% faster than GooglesSnappy... LZ4 JNISnappy JNI LZ4 Sun Unsafe Blocks in Cassandra are so small we dont see the same in production but the 95% latency is improved and it works with Java 7
    • CRC Check ChanceCRC check of each compressed block causesreads to be 2x SLOWER.Lowered crc_check_chance to 10% of reads.A move to JNI would cause a 30x boost
    • Current Stats● 12 nodes● 2 DataCenters● RF=6● 150k Writes/sec at EACH_QUORUM● 100k Reads/sec at LOCAL_QUORUM● > 6 Billion points (without replication)● 2TB on disk (compressed)● Read Latency 50%/95% is 1ms/10ms
    • Questions?Thank you!@tjake and @carlyeks