Time is MoneyFinancial Time SeriesJake Luciani and Carl YeksigianBlueMountain Capital
About this talkPart 1: Our use case and architecturePart 2: Our deployment and tuningPart 3: Q&A
Know your problem.1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle...
Know your data.AAPL priceMSFT price
Know your queries.Time Series QueryStart, End, Periodicity definesquery1 minute periods
Know your queries.Cross Section QueryAs Of time defines the queryAs Of Time (11am)
Know your queries.Cross sections are randomStoring for all possible Cross Sections is not possible.We also support bi-temp...
Lets optimize for Time Series.
CREATE TABLE tsdata (id blob,property string,asof_ticks bigint,knowledge_ticks bigint,value blob,PRIMARY KEY(id,property,a...
SELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND asof_ticks <= 2345678901CQL3...
CQL3 Queries: Cross SectionSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks = 1234567890AND kn...
A Service, not an appC*OlympusOlympusOlympusOlympusAppAppAppAppAppAppAppAppAppAppFat ClientOlympus Thrift Service Olympus ...
Complex Value TypesNot every value is a doubleSome values belong together (Bid and Ask should always come backtogether)Thr...
Ad-hoc querying UI
But thats the easy part...(queue transition)
Scaling...The first rule of scaling is you do not just turn everything to 11.
Scaling...Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your...
Cant fix what you cant measureRiemann (http://riemann.io)Easily push application and system metrics into a single systemWe...
Metrics: RiemannYammer Metrics with Riemannhttps://gist.github.com/carlyeks/5199090
Metrics: RiemannPush stream based metrics libraryRiemann Dash for Why is it Slow?Graphite forWhy was it Slow?
VisualVM: The greatest tool EVERMany useful plugins...Just start jstatd on each server and go!
Scaling Reads: MachinesSSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo ...
JBOD is a lifesaverSSDs are great until they arent anymoreJBOD allowed passive recovery in the face of simultaneous diskfa...
Scaling Reads: CassandraChanges weve made:• Configuration• Compaction• Compression
Leveled CompactionWide rows means data can be spread across a huge number ofSSTablesLeveled Compaction puts a bound on the...
Leveled Compaction: Breaking BadUnder high write load, forced to read all of the L0 filesL0L1L2L3L4L5
Hybrid Compaction: Breaking BetterSize Tiering Level 0On by default in 2.0L0L1L2L3L4L5{HybridCompactionSize TieredLeveled
Overlapping CompactionInstead of forcing a combination of L0 files with L1, we can just push upfilesThis allows a higher l...
C optimized libraryRead path needs to be fast for our workloadCRC check, composite comparison eat a lot of cyclesCRC is im...
Current Stats16 nodes2 Data CentersReplication Factor 6200k Writes/sec at EACH_QUORUM150k Reads/sec at LOCAL_QUORUM> 30 Mi...
Questions?Thank you!@tjake and @carlyeks
Upcoming SlideShare
Loading in …5
×

C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

1,613 views

Published on

This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,613
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
35
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

  1. 1. Time is MoneyFinancial Time SeriesJake Luciani and Carl YeksigianBlueMountain Capital
  2. 2. About this talkPart 1: Our use case and architecturePart 2: Our deployment and tuningPart 3: Q&A
  3. 3. Know your problem.1000s of consumers..creating and reading data as fast as possible..consistent to all readers..and handle ad-hoc user queries..quickly..across data centers.
  4. 4. Know your data.AAPL priceMSFT price
  5. 5. Know your queries.Time Series QueryStart, End, Periodicity definesquery1 minute periods
  6. 6. Know your queries.Cross Section QueryAs Of time defines the queryAs Of Time (11am)
  7. 7. Know your queries.Cross sections are randomStoring for all possible Cross Sections is not possible.We also support bi-temporality
  8. 8. Lets optimize for Time Series.
  9. 9. CREATE TABLE tsdata (id blob,property string,asof_ticks bigint,knowledge_ticks bigint,value blob,PRIMARY KEY(id,property,asof_ticks,knowledge_ticks))WITH COMPACT STORAGEAND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticksDESC)Data Model (CQL 3)
  10. 10. SELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks >= 1234567890AND asof_ticks <= 2345678901CQL3 Queries: Time Series
  11. 11. CQL3 Queries: Cross SectionSELECT * FROM tsdataWHERE id = 0x12345AND property = lastPriceAND asof_ticks = 1234567890AND knowledge_ticks < 2345678901LIMIT 1
  12. 12. A Service, not an appC*OlympusOlympusOlympusOlympusAppAppAppAppAppAppAppAppAppAppFat ClientOlympus Thrift Service Olympus Thrift Service
  13. 13. Complex Value TypesNot every value is a doubleSome values belong together (Bid and Ask should always come backtogether)Thrift structures as valuesTyped, extensible schemaUnion types give us a way to deserialize any type
  14. 14. Ad-hoc querying UI
  15. 15. But thats the easy part...(queue transition)
  16. 16. Scaling...The first rule of scaling is you do not just turn everything to 11.
  17. 17. Scaling...Step 1 - Fast Machines for your workloadStep 2 - Avoid Java GC for your workloadStep 3 - Tune Cassandra for your workloadStep 4 - Prefetch and cache for your workload
  18. 18. Cant fix what you cant measureRiemann (http://riemann.io)Easily push application and system metrics into a single systemWe push 6k metrics per second to a single Riemann instance
  19. 19. Metrics: RiemannYammer Metrics with Riemannhttps://gist.github.com/carlyeks/5199090
  20. 20. Metrics: RiemannPush stream based metrics libraryRiemann Dash for Why is it Slow?Graphite forWhy was it Slow?
  21. 21. VisualVM: The greatest tool EVERMany useful plugins...Just start jstatd on each server and go!
  22. 22. Scaling Reads: MachinesSSDs for hot dataJBOD configAs many cores as possible (> 16)10GbE networkBonded network cardsJumbo frames
  23. 23. JBOD is a lifesaverSSDs are great until they arent anymoreJBOD allowed passive recovery in the face of simultaneous diskfailures (SSDs had a bad firmware)
  24. 24. Scaling Reads: CassandraChanges weve made:• Configuration• Compaction• Compression
  25. 25. Leveled CompactionWide rows means data can be spread across a huge number ofSSTablesLeveled Compaction puts a bound on the worst case (*)Fewer SSTables to read means lower latency, as shown below; orangeSSTables get readL0L1L2L3L4L5* In Theory
  26. 26. Leveled Compaction: Breaking BadUnder high write load, forced to read all of the L0 filesL0L1L2L3L4L5
  27. 27. Hybrid Compaction: Breaking BetterSize Tiering Level 0On by default in 2.0L0L1L2L3L4L5{HybridCompactionSize TieredLeveled
  28. 28. Overlapping CompactionInstead of forcing a combination of L0 files with L1, we can just push upfilesThis allows a higher level of concurrency in compactionsWe still know the SSTables that might contain the keysWe can force a proper compaction at any configurable levelL0L1L2L3L4L5
  29. 29. C optimized libraryRead path needs to be fast for our workloadCRC check, composite comparison eat a lot of cyclesCRC is implemented on chip for some architectures (why not use it?)We want to move some of the operations into a JNI library to reducelatency and improve throughput
  30. 30. Current Stats16 nodes2 Data CentersReplication Factor 6200k Writes/sec at EACH_QUORUM150k Reads/sec at LOCAL_QUORUM> 30 Million time series> 15 Billion points10 TB on disk (compressed)Read Latency 50%/95% is 1ms/5ms
  31. 31. Questions?Thank you!@tjake and @carlyeks

×