SlideShare a Scribd company logo
1 of 43
Download to read offline
Financial Time Series
       Cassandra 1.2
Jake Luciani and Carl Yeksigian
    BlueMountain Capital
Know your problem.

1000s of consumers
..creating and reading data as fast as possible
..consistent to all readers
..and handle ad-hoc user queries
..quickly
..across datacenters.
Know your data.
AAPL price



MSFT price
Know your queries.
Time Series Query




                                          en
      st
        ar




                                          d
        t(




                                           (2
         10




                                             pm
             am




                                               )
                       1 minute periods
               )




Start, End, Periodicity defines query
Know your queries.
Cross Section Query




                               As Of Time (11am)

As Of time defines the query
Know your queries.

● Cross sections are for random data
● Storing for Cross Sections means thousands of
  writes, inconsistent queries
● We also need bitemporality, but it's hard, so let's
  ignore it in the query
Know your users.
A million, billion writes per second
..and reads are fast and happen at the same time
..and we can answer everything consistently
..and it scales to new use cases quickly
..and it's all done yesterday
Let's optimize for Time Series.
  Since we can't optimize for everything.
Data Model (in C* 1.1)
 AAPL   lastPrice:2013-03-18:2013-03-19   0E-34-88-FF-26-E3-2C


        lastPrice:2013-03-19:2012-03-19   0E-34-88-FF-26-E3-3D


        lastPrice:2013-03-19:2013-03-20   0E-34-88-FF-26-E3-4E
But we're using C* 1.2.
        CQL3              Parallel Compaction
       V-nodes           Off-Heap Bloom Filters
        JBOD                    Metrics!
Pooled Decompression   Concurrent Schema Creation
       buffers
     SSD Aware
Data Model (CQL 3)
CREATE TABLE tsdata (
    id blob,
    property string,
    asof_ticks bigint,
    knowledge_ticks bigint,
    value blob,
    PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)
)
WITH COMPACT STORAGE
AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
CQL3 Queries: Time Series

SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks >= 1234567890
AND asof_ticks <= 2345678901
CQL3 Queries: Cross Section

SELECT * FROM tsdata
WHERE id = 0x12345
AND property = 'lastPrice'
AND asof_ticks = 1234567890
AND knowledge_ticks < 2345678901
LIMIT 1
Data Overload!
All points between start and end
Even though we have a periodicity

All knowledge times
Even though we only want latest
A Service, not an app

 App                              Olympus                       App
                                            Ol
                                              ym




                             s
                        pu
                                                pu



                       ym
                                                  s


                Ol
 App                                                            App




                                                      Olympus
             Olympus
 App                                C*                          App


 App                                                            App




                                               Ol
                         s
                       pu




                                                 ym
                            lym



                                            pu
                                              s
                          O
 App                              Olympus                       App
Filtration
Filter everything by knowledge time
Filter time series by periodicity
200k points filtered down to 300

AAPL:lastPrice:2013-03-18:2013-03-19             AAPL:lastPrice:2013-03-18:2013-03-19
AAPL:lastPrice:2013-03-19:2013-03-19   Service   AAPL:lastPrice:2013-03-19:2013-03-20
            Cassandra Reads
AAPL:lastPrice:2013-03-19:2013-03-20             AAPL:lastPrice:2013-03-20:2013-03-21
                                        Filter
AAPL:lastPrice:2013-03-20:2013-03-20
AAPL:lastPrice:2013-03-20:2013-03-21
Pushdown Filters

● To provide periodicity on raw data, downsample
  on write
● There are still cases where we don't know how
  to sample
● This filtering should be pushed to C*
● The coordinator node should apply a filter to the
  result set
Complex Value Types
Not every value is a double
Some values belong together
Bid and Ask should come back together
Thrift
Thrift structures as values
Typed, extensible schema
Union types give us a way to deserialize any type
Thrift: Union Types




                      https://gist.github.com/carlyeks/5199559
But that's the easy
part...
Scaling...
The first rule of scaling is you do not just turn
eveything to 11.
Scaling...
Step 1 - Fast Machines for your workload
Step 2 - Avoid Java GC for your workload
Step 3 - Tune Cassandra for your workload
Step 4 - Prefetch and cache for your workload
Can't fix what you can't measure

Riemann (http://riemann.io)
Easily push application and system metrics into a single system
We push 4k metrics per second to a single Riemann instance
Metrics: Riemann

Yammer Metrics with Riemann




                              https://gist.github.com/carlyeks/5199090
Metrics: Riemann

Push stream based metrics library
Riemann Dash for Why is it Slow?




Graphite for
Why was it
Slow?
VisualVM-The greatest tool EVER

Many useful plugins...
Just start jstatd on each server and go!
Scaling Reads: Machines
SSDs for hot data
JBOD config
As many cores as possible (> 16)
10GbE network
Bonded network cards
Jumbo frames
JBOD is a lifesaver

SSDs are great until they aren't anymore
JBOD allowed passive recovery in the face of
simultaneous disk failures (SSDs had a bad
firmware)
Scaling Reads: JVM

                                            M
-Xmx12G                                   JV gic!
                                           Ma

-Xmn1600M
-XX:SurvivorRatio=16
-XX:+UseCompressedOops

-XX:+UseTLAB yields ~15% Boost!
(Thread local allocators, good for SEDA
architectures)
Scaling Reads: Cassandra

Changes we've made:
● Configuration
● Compaction
● Compression
● Pushdown Filters
Scaling Cassandra:
Configuration
Hinted Handoff
HHO single threaded, 100kb throttle
Scaling Cassandra:
Configuration
memtable size
2048mb, instead of 1/3 heap

We're using a 12gb heap; leaves enough room for memtables
while the majority is left for reads and compaction.
Scaling Cassandra:
Configuration
Half-Sync Half-Async server
No thread dedicated to an idle connection
We have a lot of idle connections
Scaling Cassandra:
Configuration
Multithreaded compaction, 4 cores
More threads to compact means fast
Too many threads means resource contention
Scaling Cassandra:
Configuration
Disabled internode compression
Caused too much GC and Latency
On a 10GbE network, who needs compression?
Leveled Compaction
Wide rows means data can be spread across a
huge number of SSTables
Leveled Compaction puts a bound on the worst
case (*)
Fewer SSTables to read means lower latency, as
shown below; orange SSTables get read
                                               L0
 * In Theory                                   L1
                                               L2
                                               L3
                                               L4
                                               L5
Leveled Compaction
Breaking Bad

Under high write load, forced to read all of the L0
files




                                                  L0
                                                  L1
                                                  L2
                                                  L3
                                                  L4
                                                  L5
Hybrid Compaction
  Breaking Better

  Size Tiering Level 0


                   Size Tiered



   Hybrid
Compaction
             {   Leveled
                                 L0
                                 L1
                                 L2
                                 L3
                                 L4
                                 L5
Better Compression:
New LZ4Compressor

LZ4 Compression is 40% faster than Google's
Snappy...                    LZ4 JNI
Snappy JNI



                                                                  LZ4 Sun Unsafe




    Blocks in Cassandra are so small we don't see the same in production but the 95%
    latency is improved and it works with Java 7
CRC Check Chance

CRC check of each compressed block causes
reads to be 2x SLOWER.
Lowered crc_check_chance to 10% of reads.
A move to JNI would cause a 30x boost
Current Stats

●   12 nodes
●   2 DataCenters
●   RF=6
●   150k Writes/sec at EACH_QUORUM
●   100k Reads/sec at LOCAL_QUORUM
●   > 6 Billion points (without replication)
●   2TB on disk (compressed)
●   Read Latency 50%/95% is 1ms/10ms
Questions?




Thank you!

@tjake and @carlyeks

More Related Content

Viewers also liked

Affordable travel: Kenting Taiwan
Affordable travel: Kenting TaiwanAffordable travel: Kenting Taiwan
Affordable travel: Kenting TaiwanMUSTHoover
 
21 февраля в группе
21 февраля в группе21 февраля в группе
21 февраля в группеvirtualtaganrog
 
FIGURAS GEOMETRICAS
FIGURAS GEOMETRICASFIGURAS GEOMETRICAS
FIGURAS GEOMETRICASsaraycreek
 
ทรัพยากรน้ำ
ทรัพยากรน้ำ ทรัพยากรน้ำ
ทรัพยากรน้ำ Ball Prasertsang
 
Intro to sustainability intro
Intro to sustainability introIntro to sustainability intro
Intro to sustainability introIan Garrett
 
Mathematics(ME)(Khagendradewangan.blogspot.in)
Mathematics(ME)(Khagendradewangan.blogspot.in)Mathematics(ME)(Khagendradewangan.blogspot.in)
Mathematics(ME)(Khagendradewangan.blogspot.in)KHAGENDRA KUMAR DEWANGAN
 
Making a moodboard
Making a moodboardMaking a moodboard
Making a moodboardchowders
 
5 estruturas de controle
5 estruturas de controle5 estruturas de controle
5 estruturas de controlePAULO Moreira
 
Kelas tahun 5 rajin 2014 for merge
Kelas tahun 5 rajin  2014   for mergeKelas tahun 5 rajin  2014   for merge
Kelas tahun 5 rajin 2014 for mergeSiti Norwati
 

Viewers also liked (13)

Affordable travel: Kenting Taiwan
Affordable travel: Kenting TaiwanAffordable travel: Kenting Taiwan
Affordable travel: Kenting Taiwan
 
21 февраля в группе
21 февраля в группе21 февраля в группе
21 февраля в группе
 
Objective runtime
Objective runtimeObjective runtime
Objective runtime
 
FIGURAS GEOMETRICAS
FIGURAS GEOMETRICASFIGURAS GEOMETRICAS
FIGURAS GEOMETRICAS
 
2007 urok greek cafee
2007 urok greek cafee2007 urok greek cafee
2007 urok greek cafee
 
ทรัพยากรน้ำ
ทรัพยากรน้ำ ทรัพยากรน้ำ
ทรัพยากรน้ำ
 
Intro to sustainability intro
Intro to sustainability introIntro to sustainability intro
Intro to sustainability intro
 
SMG Permaseal Non Lubricated Tapered Plug Valve
SMG Permaseal Non Lubricated Tapered Plug ValveSMG Permaseal Non Lubricated Tapered Plug Valve
SMG Permaseal Non Lubricated Tapered Plug Valve
 
Jamie's resume
Jamie's resumeJamie's resume
Jamie's resume
 
Mathematics(ME)(Khagendradewangan.blogspot.in)
Mathematics(ME)(Khagendradewangan.blogspot.in)Mathematics(ME)(Khagendradewangan.blogspot.in)
Mathematics(ME)(Khagendradewangan.blogspot.in)
 
Making a moodboard
Making a moodboardMaking a moodboard
Making a moodboard
 
5 estruturas de controle
5 estruturas de controle5 estruturas de controle
5 estruturas de controle
 
Kelas tahun 5 rajin 2014 for merge
Kelas tahun 5 rajin  2014   for mergeKelas tahun 5 rajin  2014   for merge
Kelas tahun 5 rajin 2014 for merge
 

Similar to NYC* Big Tech Day 2013: Financial Time Series

Message Queues : A Primer - International PHP Conference Fall 2012
Message Queues : A Primer - International PHP Conference Fall 2012Message Queues : A Primer - International PHP Conference Fall 2012
Message Queues : A Primer - International PHP Conference Fall 2012Mike Willbanks
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianDataStax Academy
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlangMirko Bonadei
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scalexcbsmith
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricAlexander Dean
 
At Scale With Style (Erlang User Conference 2012)
At Scale With Style (Erlang User Conference 2012)At Scale With Style (Erlang User Conference 2012)
At Scale With Style (Erlang User Conference 2012)Wooga
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceGlenn K. Lockwood
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...StreamNative
 
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gvvgy_a
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 

Similar to NYC* Big Tech Day 2013: Financial Time Series (20)

Message Queues : A Primer - International PHP Conference Fall 2012
Message Queues : A Primer - International PHP Conference Fall 2012Message Queues : A Primer - International PHP Conference Fall 2012
Message Queues : A Primer - International PHP Conference Fall 2012
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 
An introduction to erlang
An introduction to erlangAn introduction to erlang
An introduction to erlang
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scale
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabric
 
At Scale With Style (Erlang User Conference 2012)
At Scale With Style (Erlang User Conference 2012)At Scale With Style (Erlang User Conference 2012)
At Scale With Style (Erlang User Conference 2012)
 
At Scale With Style
At Scale With StyleAt Scale With Style
At Scale With Style
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
 
Numba
NumbaNumba
Numba
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gv
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Liferay and Cloud
Liferay and CloudLiferay and Cloud
Liferay and Cloud
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

NYC* Big Tech Day 2013: Financial Time Series

  • 1. Financial Time Series Cassandra 1.2 Jake Luciani and Carl Yeksigian BlueMountain Capital
  • 2. Know your problem. 1000s of consumers ..creating and reading data as fast as possible ..consistent to all readers ..and handle ad-hoc user queries ..quickly ..across datacenters.
  • 3. Know your data. AAPL price MSFT price
  • 4. Know your queries. Time Series Query en st ar d t( (2 10 pm am ) 1 minute periods ) Start, End, Periodicity defines query
  • 5. Know your queries. Cross Section Query As Of Time (11am) As Of time defines the query
  • 6. Know your queries. ● Cross sections are for random data ● Storing for Cross Sections means thousands of writes, inconsistent queries ● We also need bitemporality, but it's hard, so let's ignore it in the query
  • 7. Know your users. A million, billion writes per second ..and reads are fast and happen at the same time ..and we can answer everything consistently ..and it scales to new use cases quickly ..and it's all done yesterday
  • 8. Let's optimize for Time Series. Since we can't optimize for everything.
  • 9. Data Model (in C* 1.1) AAPL lastPrice:2013-03-18:2013-03-19 0E-34-88-FF-26-E3-2C lastPrice:2013-03-19:2012-03-19 0E-34-88-FF-26-E3-3D lastPrice:2013-03-19:2013-03-20 0E-34-88-FF-26-E3-4E
  • 10. But we're using C* 1.2. CQL3 Parallel Compaction V-nodes Off-Heap Bloom Filters JBOD Metrics! Pooled Decompression Concurrent Schema Creation buffers SSD Aware
  • 11. Data Model (CQL 3) CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)
  • 12. CQL3 Queries: Time Series SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks >= 1234567890 AND asof_ticks <= 2345678901
  • 13. CQL3 Queries: Cross Section SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks = 1234567890 AND knowledge_ticks < 2345678901 LIMIT 1
  • 14. Data Overload! All points between start and end Even though we have a periodicity All knowledge times Even though we only want latest
  • 15. A Service, not an app App Olympus App Ol ym s pu pu ym s Ol App App Olympus Olympus App C* App App App Ol s pu ym lym pu s O App Olympus App
  • 16. Filtration Filter everything by knowledge time Filter time series by periodicity 200k points filtered down to 300 AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-18:2013-03-19 AAPL:lastPrice:2013-03-19:2013-03-19 Service AAPL:lastPrice:2013-03-19:2013-03-20 Cassandra Reads AAPL:lastPrice:2013-03-19:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21 Filter AAPL:lastPrice:2013-03-20:2013-03-20 AAPL:lastPrice:2013-03-20:2013-03-21
  • 17. Pushdown Filters ● To provide periodicity on raw data, downsample on write ● There are still cases where we don't know how to sample ● This filtering should be pushed to C* ● The coordinator node should apply a filter to the result set
  • 18. Complex Value Types Not every value is a double Some values belong together Bid and Ask should come back together
  • 19. Thrift Thrift structures as values Typed, extensible schema Union types give us a way to deserialize any type
  • 20. Thrift: Union Types https://gist.github.com/carlyeks/5199559
  • 21. But that's the easy part...
  • 22. Scaling... The first rule of scaling is you do not just turn eveything to 11.
  • 23. Scaling... Step 1 - Fast Machines for your workload Step 2 - Avoid Java GC for your workload Step 3 - Tune Cassandra for your workload Step 4 - Prefetch and cache for your workload
  • 24. Can't fix what you can't measure Riemann (http://riemann.io) Easily push application and system metrics into a single system We push 4k metrics per second to a single Riemann instance
  • 25. Metrics: Riemann Yammer Metrics with Riemann https://gist.github.com/carlyeks/5199090
  • 26. Metrics: Riemann Push stream based metrics library Riemann Dash for Why is it Slow? Graphite for Why was it Slow?
  • 27. VisualVM-The greatest tool EVER Many useful plugins... Just start jstatd on each server and go!
  • 28. Scaling Reads: Machines SSDs for hot data JBOD config As many cores as possible (> 16) 10GbE network Bonded network cards Jumbo frames
  • 29. JBOD is a lifesaver SSDs are great until they aren't anymore JBOD allowed passive recovery in the face of simultaneous disk failures (SSDs had a bad firmware)
  • 30. Scaling Reads: JVM M -Xmx12G JV gic! Ma -Xmn1600M -XX:SurvivorRatio=16 -XX:+UseCompressedOops -XX:+UseTLAB yields ~15% Boost! (Thread local allocators, good for SEDA architectures)
  • 31. Scaling Reads: Cassandra Changes we've made: ● Configuration ● Compaction ● Compression ● Pushdown Filters
  • 32. Scaling Cassandra: Configuration Hinted Handoff HHO single threaded, 100kb throttle
  • 33. Scaling Cassandra: Configuration memtable size 2048mb, instead of 1/3 heap We're using a 12gb heap; leaves enough room for memtables while the majority is left for reads and compaction.
  • 34. Scaling Cassandra: Configuration Half-Sync Half-Async server No thread dedicated to an idle connection We have a lot of idle connections
  • 35. Scaling Cassandra: Configuration Multithreaded compaction, 4 cores More threads to compact means fast Too many threads means resource contention
  • 36. Scaling Cassandra: Configuration Disabled internode compression Caused too much GC and Latency On a 10GbE network, who needs compression?
  • 37. Leveled Compaction Wide rows means data can be spread across a huge number of SSTables Leveled Compaction puts a bound on the worst case (*) Fewer SSTables to read means lower latency, as shown below; orange SSTables get read L0 * In Theory L1 L2 L3 L4 L5
  • 38. Leveled Compaction Breaking Bad Under high write load, forced to read all of the L0 files L0 L1 L2 L3 L4 L5
  • 39. Hybrid Compaction Breaking Better Size Tiering Level 0 Size Tiered Hybrid Compaction { Leveled L0 L1 L2 L3 L4 L5
  • 40. Better Compression: New LZ4Compressor LZ4 Compression is 40% faster than Google's Snappy... LZ4 JNI Snappy JNI LZ4 Sun Unsafe Blocks in Cassandra are so small we don't see the same in production but the 95% latency is improved and it works with Java 7
  • 41. CRC Check Chance CRC check of each compressed block causes reads to be 2x SLOWER. Lowered crc_check_chance to 10% of reads. A move to JNI would cause a 30x boost
  • 42. Current Stats ● 12 nodes ● 2 DataCenters ● RF=6 ● 150k Writes/sec at EACH_QUORUM ● 100k Reads/sec at LOCAL_QUORUM ● > 6 Billion points (without replication) ● 2TB on disk (compressed) ● Read Latency 50%/95% is 1ms/10ms