SlideShare a Scribd company logo
1 of 43
Download to read offline
Financial Time Series
       Cassandra 1.2
Jake Luciani and Carl Yeksigian
    BlueMountain Capital
Know your problem.

1000s of consumers
..creating and reading data as fast as possible
..consistent to all readers
..and handle ad-hoc user queries
..quickly
..across datacenters.
Know your data.
AAPL price



MSFT price
Know your queries.
Time Series Query




                                          en
      st
        ar




                                          d
        t(




                                           (2
         10




                                             pm
             am




                                               )
                       1 minute periods
               )




Start, End, Periodicity defines query
Know your queries.
Cross Section Query




                               As Of Time (11am)

As Of time defines the query
Know your queries.

● Cross sections are for random data
● Storing for Cross Sections means thousands of
  writes, inconsistent queries
● We also need bitemporality, but it's hard, so let's
  ignore it in the query
Know your users.
A million, billion writes per second
..and reads are fast and happen at the same time
..and we can answer everything consistently
..and it scales to new use cases quickly
..and it's all done yesterday
Let's optimize for Time Series.
  Since we can't optimize for everything.
Data Model (in C* 1.1)
 AAPL   lastPrice:2013-03-18:2013-03-19   0E-34-88-FF-26-E3-2C


        lastPrice:2013-03-19:2012-03-19   0E-34-88-FF-26-E3-3D


        lastPrice:2013-03-19:2013-03-20   0E-34-88-FF-26-E3-4E
But we're using C* 1.2.
        CQL3              Parallel Compaction
       V-nodes           Off-Heap Bloom Filters
        JBOD                    Metrics!
Pooled Decompression   Concurrent Schema Creation
       buffers
     SSD Aware
Data Model (CQL 3)
CREATE TABLE tsdata (
    id blob,
    property string,
    asof_ticks bigint,
    knowledge_ticks bigint,
    value blob,
    PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)
)
WITH COMPACT STORAGE
AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
CQL3 Queries: Time Series

SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks >= 1234567890
AND asof_ticks <= 2345678901
CQL3 Queries: Cross Section

SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks = 1234567890
AND knowledge_ticks < 2345678901
LIMIT 1
Data Overload!
All points between start and end
Even though we have a periodicity

All knowledge times
Even though we only want latest
A Service, not an app

 App                              Olympus                       App
                                            Ol
                                              ym




                             s
                        pu
                                                pu



                       ym
                                                  s


                Ol
 App                                                            App




                                                      Olympus
             Olympus
 App                                C*                          App


 App                                                            App




                                               Ol
                         s
                       pu




                                                 ym
                            lym



                                            pu
                                              s
                          O
 App                              Olympus                       App
Filtration
Filter everything by knowledge time
Filter time series by periodicity
200k points filtered down to 300

AAPL:lastPrice:2013-03-18:2013-03-19             AAPL:lastPrice:2013-03-18:2013-03-19
AAPL:lastPrice:2013-03-19:2013-03-19   Service   AAPL:lastPrice:2013-03-19:2013-03-20
            Cassandra Reads
AAPL:lastPrice:2013-03-19:2013-03-20             AAPL:lastPrice:2013-03-20:2013-03-21
                                        Filter
AAPL:lastPrice:2013-03-20:2013-03-20
AAPL:lastPrice:2013-03-20:2013-03-21
Pushdown Filters

● To provide periodicity on raw data, downsample
  on write
● There are still cases where we don't know how
  to sample
● This filtering should be pushed to C*
● The coordinator node should apply a filter to the
  result set
Complex Value Types
Not every value is a double
Some values belong together
Bid and Ask should come back together
Thrift
Thrift structures as values
Typed, extensible schema
Union types give us a way to deserialize any type
Thrift: Union Types




                      https://gist.github.com/carlyeks/5199559
But that's the easy
part...
Scaling...
The first rule of scaling is you do not just turn
eveything to 11.
Scaling...
Step 1 - Fast Machines for your workload
Step 2 - Avoid Java GC for your workload
Step 3 - Tune Cassandra for your workload
Step 4 - Prefetch and cache for your workload
Can't fix what you can't measure

Riemann (http://riemann.io)
Easily push application and system metrics into a single system
We push 4k metrics per second to a single Riemann instance
Metrics: Riemann

Yammer Metrics with Riemann




                              https://gist.github.com/carlyeks/5199090
Metrics: Riemann

Push stream based metrics library
Riemann Dash for Why is it Slow?




Graphite for
Why was it
Slow?
VisualVM-The greatest tool EVER

Many useful plugins...
Just start jstatd on each server and go!
Scaling Reads: Machines
SSDs for hot data
JBOD config
As many cores as possible (> 16)
10GbE network
Bonded network cards
Jumbo frames
JBOD is a lifesaver

SSDs are great until they aren't anymore
JBOD allowed passive recovery in the face of
simultaneous disk failures (SSDs had a bad
firmware)
Scaling Reads: JVM

                                            M
-Xmx12G                                   JV gic!
                                           Ma

-Xmn1600M
-XX:SurvivorRatio=16
-XX:+UseCompressedOops

-XX:+UseTLAB yields ~15% Boost!
(Thread local allocators, good for SEDA
architectures)
Scaling Reads: Cassandra

Changes we've made:
● Configuration
● Compaction
● Compression
● Pushdown Filters
Scaling Cassandra:
Configuration
Hinted Handoff
HHO single threaded, 100kb throttle
Scaling Cassandra:
Configuration
memtable size
2048mb, instead of 1/3 heap

We're using a 12gb heap; leaves enough room for memtables
while the majority is left for reads and compaction.
Scaling Cassandra:
Configuration
Half-Sync Half-Async server
No thread dedicated to an idle connection
We have a lot of idle connections
Scaling Cassandra:
Configuration
Multithreaded compaction, 4 cores
More threads to compact means fast
Too many threads means resource contention
Scaling Cassandra:
Configuration
Disabled internode compression
Caused too much GC and Latency
On a 10GbE network, who needs compression?
Leveled Compaction
Wide rows means data can be spread across a
huge number of SSTables
Leveled Compaction puts a bound on the worst
case (*)
Fewer SSTables to read means lower latency, as
shown below; orange SSTables get read
                                               L0
 * In Theory                                   L1
                                               L2
                                               L3
                                               L4
                                               L5
Leveled Compaction
Breaking Bad

Under high write load, forced to read all of the L0
files




                                                  L0
                                                  L1
                                                  L2
                                                  L3
                                                  L4
                                                  L5
Hybrid Compaction
  Breaking Better

  Size Tiering Level 0


                   Size Tiered



   Hybrid
Compaction
             {   Leveled
                                 L0
                                 L1
                                 L2
                                 L3
                                 L4
                                 L5
Better Compression:
New LZ4Compressor

LZ4 Compression is 40% faster than Google's
Snappy...                    LZ4 JNI
Snappy JNI



                                                                  LZ4 Sun Unsafe




    Blocks in Cassandra are so small we don't see the same in production but the 95%
    latency is improved and it works with Java 7
CRC Check Chance

CRC check of each compressed block causes
reads to be 2x SLOWER.
Lowered crc_check_chance to 10% of reads.
A move to JNI would cause a 30x boost
Current Stats

●   12 nodes
●   2 DataCenters
●   RF=6
●   150k Writes/sec at EACH_QUORUM
●   100k Reads/sec at LOCAL_QUORUM
●   > 6 Billion points (without replication)
●   2TB on disk (compressed)
●   Read Latency 50%/95% is 1ms/10ms
Questions?




Thank you!

@tjake and @carlyeks

More Related Content

Similar to NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2

The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
StreamNative
 
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gv
vgy_a
 

Similar to NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2 (20)

Message Queues : A Primer - International PHP Conference Fall 2012
Message Queues : A Primer - International PHP Conference Fall 2012Message Queues : A Primer - International PHP Conference Fall 2012
Message Queues : A Primer - International PHP Conference Fall 2012
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlang
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scale
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabric
 
At Scale With Style
At Scale With StyleAt Scale With Style
At Scale With Style
 
At Scale With Style (Erlang User Conference 2012)
At Scale With Style (Erlang User Conference 2012)At Scale With Style (Erlang User Conference 2012)
At Scale With Style (Erlang User Conference 2012)
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
 
Numba
NumbaNumba
Numba
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gv
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Liferay and Cloud
Liferay and CloudLiferay and Cloud
Liferay and Cloud
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

NYC* Tech Day — BlueMountain Capital — Financial Time Series w/Cassandra 1.2

  • 1. Financial Time Series Cassandra 1.2 Jake Luciani and Carl Yeksigian BlueMountain Capital
  • 2. Know your problem. 1000s of consumers ..creating and reading data as fast as possible ..consistent to all readers ..and handle ad-hoc user queries ..quickly ..across datacenters.
  • 3. Know your data. AAPL price MSFT price
  • 4. Know your queries. Time Series Query en st ar d t( (2 10 pm am ) 1 minute periods ) Start, End, Periodicity defines query
  • 5. Know your queries. Cross Section Query As Of Time (11am) As Of time defines the query
  • 6. Know your queries. ● Cross sections are for random data ● Storing for Cross Sections means thousands of writes, inconsistent queries ● We also need bitemporality, but it's hard, so let's ignore it in the query
  • 7. Know your users. A million, billion writes per second ..and reads are fast and happen at the same time ..and we can answer everything consistently ..and it scales to new use cases quickly ..and it's all done yesterday
  • 8. Let's optimize for Time Series. Since we can't optimize for everything.
  • 9. Data Model (in C* 1.1) AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C lastPrice:2013-03-19:2012-03-19 0E-34-88-FF-26-E3-3D lastPrice:2013-03-19:2013-03-20 0E-34-88-FF-26-E3-4E
  • 10. But we're using C* 1.2. CQL3 Parallel Compaction V-nodes Off-Heap Bloom Filters JBOD Metrics! Pooled Decompression Concurrent Schema Creation buffers SSD Aware
  • 11. Data Model (CQL 3) CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
  • 12. CQL3 Queries: Time Series SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks >= 1234567890 AND asof_ticks <= 2345678901
  • 13. CQL3 Queries: Cross Section SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks = 1234567890 AND knowledge_ticks < 2345678901 LIMIT 1
  • 14. Data Overload! All points between start and end Even though we have a periodicity All knowledge times Even though we only want latest
  • 15. A Service, not an app App Olympus App Ol ym s pu pu ym s Ol App App Olympus Olympus App C* App App App Ol s pu ym lym pu s O App Olympus App
  • 16. Filtration Filter everything by knowledge time Filter time series by periodicity 200k points filtered down to 300 AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-19:2013-03-19 Service AAPL:lastPrice:2013-03-19:2013-03-20 Cassandra Reads AAPL:lastPrice:2013-03-19:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21 Filter AAPL:lastPrice:2013-03-20:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21
  • 17. Pushdown Filters ● To provide periodicity on raw data, downsample on write ● There are still cases where we don't know how to sample ● This filtering should be pushed to C* ● The coordinator node should apply a filter to the result set
  • 18. Complex Value Types Not every value is a double Some values belong together Bid and Ask should come back together
  • 19. Thrift Thrift structures as values Typed, extensible schema Union types give us a way to deserialize any type
  • 20. Thrift: Union Types https://gist.github.com/carlyeks/5199559
  • 21. But that's the easy part...
  • 22. Scaling... The first rule of scaling is you do not just turn eveything to 11.
  • 23. Scaling... Step 1 - Fast Machines for your workload Step 2 - Avoid Java GC for your workload Step 3 - Tune Cassandra for your workload Step 4 - Prefetch and cache for your workload
  • 24. Can't fix what you can't measure Riemann (http://riemann.io) Easily push application and system metrics into a single system We push 4k metrics per second to a single Riemann instance
  • 25. Metrics: Riemann Yammer Metrics with Riemann https://gist.github.com/carlyeks/5199090
  • 26. Metrics: Riemann Push stream based metrics library Riemann Dash for Why is it Slow? Graphite for Why was it Slow?
  • 27. VisualVM-The greatest tool EVER Many useful plugins... Just start jstatd on each server and go!
  • 28. Scaling Reads: Machines SSDs for hot data JBOD config As many cores as possible (> 16) 10GbE network Bonded network cards Jumbo frames
  • 29. JBOD is a lifesaver SSDs are great until they aren't anymore JBOD allowed passive recovery in the face of simultaneous disk failures (SSDs had a bad firmware)
  • 30. Scaling Reads: JVM M -Xmx12G JV gic! Ma -Xmn1600M -XX:SurvivorRatio=16 -XX:+UseCompressedOops -XX:+UseTLAB yields ~15% Boost! (Thread local allocators, good for SEDA architectures)
  • 31. Scaling Reads: Cassandra Changes we've made: ● Configuration ● Compaction ● Compression ● Pushdown Filters
  • 32. Scaling Cassandra: Configuration Hinted Handoff HHO single threaded, 100kb throttle
  • 33. Scaling Cassandra: Configuration memtable size 2048mb, instead of 1/3 heap We're using a 12gb heap; leaves enough room for memtables while the majority is left for reads and compaction.
  • 34. Scaling Cassandra: Configuration Half-Sync Half-Async server No thread dedicated to an idle connection We have a lot of idle connections
  • 35. Scaling Cassandra: Configuration Multithreaded compaction, 4 cores More threads to compact means fast Too many threads means resource contention
  • 36. Scaling Cassandra: Configuration Disabled internode compression Caused too much GC and Latency On a 10GbE network, who needs compression?
  • 37. Leveled Compaction Wide rows means data can be spread across a huge number of SSTables Leveled Compaction puts a bound on the worst case (*) Fewer SSTables to read means lower latency, as shown below; orange SSTables get read L0 * In Theory L1 L2 L3 L4 L5
  • 38. Leveled Compaction Breaking Bad Under high write load, forced to read all of the L0 files L0 L1 L2 L3 L4 L5
  • 39. Hybrid Compaction Breaking Better Size Tiering Level 0 Size Tiered Hybrid Compaction { Leveled L0 L1 L2 L3 L4 L5
  • 40. Better Compression: New LZ4Compressor LZ4 Compression is 40% faster than Google's Snappy... LZ4 JNI Snappy JNI LZ4 Sun Unsafe Blocks in Cassandra are so small we don't see the same in production but the 95% latency is improved and it works with Java 7
  • 41. CRC Check Chance CRC check of each compressed block causes reads to be 2x SLOWER. Lowered crc_check_chance to 10% of reads. A move to JNI would cause a 30x boost
  • 42. Current Stats ● 12 nodes ● 2 DataCenters ● RF=6 ● 150k Writes/sec at EACH_QUORUM ● 100k Reads/sec at LOCAL_QUORUM ● > 6 Billion points (without replication) ● 2TB on disk (compressed) ● Read Latency 50%/95% is 1ms/10ms