SlideShare a Scribd company logo
1 of 50
© 2016 GridPoint, Inc. 1
Getting to a no-compromises
Cassandra write path
Mitch Gitman
senior software engineer
GridPoint, Inc.
© 2016 GridPoint, Inc. 2
9/13/2016
© 2016 GridPoint, Inc. 3
9/13/2016
© 2016 GridPoint, Inc. 4
Format
9/13/2016
© 2016 GridPoint, Inc. 5
Our timeseries schema
9/13/2016
• highly denormalized
− cumulative tables
− latest tables
− ancillary tables: audit, secondary lookups
© 2016 GridPoint, Inc. 6
DataStax Java Driver
9/13/2016
datastax.github.io/java-driver/
working directly and indirectly
© 2016 GridPoint, Inc. 7
DataStax Java Driver Version 3.0
9/13/2016
• released in January
• native protocol version 4
• custom codecs
− cassandra-driver-extras
• java.time.Instant <--> CQL timestamp
• Java array <--> CQL list
• Java enum <--> CQL varchar or int
• ... and more
• retry policy enhancements
© 2016 GridPoint, Inc. 8
The anti-Cassandra
Cassandra persistence approach
9/13/2016
© 2016 GridPoint, Inc. 9
Going async
9/13/2016
• the Cassandra binary protocol--take advantage of it
• Netty--take advantage of it
• should the async API be applied more broadly?
− datastax.github.io/java-driver/manual/pooling/
− 32,768 > 1 or 327,680 > 10
© 2016 GridPoint, Inc. 10
Asynchronous queries
9/13/2016
• The denormalized write-path building blocks
• datastax.github.io/java-driver/manual/async/
• DataStax Java Driver ResultSetFuture -->
Guava ListenableFuture<ResultSet>
© 2016 GridPoint, Inc. 11
Cassandra persistence: take 2
9/13/2016
© 2016 GridPoint, Inc. 12
Corralling the asynchronous query results
9/13/2016
• Guava FutureCallback
© 2016 GridPoint, Inc. 13
Corralling the asynchronous query results (cont.)
9/13/2016
© 2016 GridPoint, Inc. 14
Corralling the async query results (cont.)
9/13/2016
• limited, isolated use of state
− ingestion state
− completion state
© 2016 GridPoint, Inc. 15
The Cassandra write component
9/13/2016
four component-level semantic contracts:
• fire and forget
• block on callbacks
• return a future
• streaming consumer
component-level logical write:
• unit of work
• idempotent
all fronting an inherently asynchronous implementation
© 2016 GridPoint, Inc. 16
What all these semantic contracts have in common
9/13/2016
• they all call:
• single result:
• success: all succeed
• failure: any fail
• metrics
© 2016 GridPoint, Inc. 17
Fire and forget
9/13/2016
considerations:
• what if the write fails?
hallmarks:
• provide a passive API to consumers
• not holding up the main caller thread
• caller has no direct feedback on the fate of the write
so what is it good for?
• logging-like data
© 2016 GridPoint, Inc. 18
Block on callbacks
9/13/2016
considerations:
• scalability limited by synchronous caller thread
hallmarks:
• provide a passive API to consumers
• hold up the main caller thread for the callbacks to come back
• timeouts affect the caller
so what is it good for?
• preserving an existing synchronous contract
• batch-oriented bulk imports
• integration testing
© 2016 GridPoint, Inc. 19
Block on callbacks (cont.)
9/13/2016
© 2016 GridPoint, Inc. 20
Return a future
9/13/2016
considerations:
• opens up the ability to handle a greater workload
• trust the caller
hallmarks:
• provide a passive API to consumers
• not holding up the main caller thread
• caller manages resolution of the write calls
so what is it good for?
• internally managing throttling
• tapping into a streaming or workflow solution
© 2016 GridPoint, Inc. 21
Streaming consumer
9/13/2016
considerations:
• way to maximize throughput and scalability
• making the write component a streaming consumer vs.
letting the streaming toolkit write directly to Cassandra
hallmarks:
• the write component notifies upstream producers when it's
(un)available
• ability to elastically respond to load
what is it good for?
• natural fit for Kafka producer for near-real-time
streaming applications
• potential to plug into different streaming providers
© 2016 GridPoint, Inc. 22
Choosing between the four component-level
semantic contracts
9/13/2016
a multiple-choice question:
• fire and forget
• block on callbacks
• return a future
• streaming consumer
questions:
• what are our usage patterns?
• what are our scalability requirements?
• should we rethink our usage patterns?
• what should we do sooner and what should we put off
until later?
back to the original question: which component-level
semantic contract to choose?
© 2016 GridPoint, Inc. 23
Avoiding overloading the cluster
9/13/2016
• the Cassandra client specifically:
− queueing up write units of work
• in general:
− less sophisticated <--> more sophisticated
− load testing <--> elastic (de)provisioning
© 2016 GridPoint, Inc. 24
But what about throttling?
9/13/2016
How to avoid Thread.sleep without getting ahead of yourself
considerations:
• sstables catch-up
• traffic
datastax.github.io/java-driver/manual/pooling/
back pressure under load
ingestion state & completion statenative_transport_max_concurrent_connections
native_transport_max_concurrent_connections_per_ip
© 2016 GridPoint, Inc. 25
RetryPolicy
9/13/2016
datastax.github.io/java-driver/manual/retries/
avoid a death spiral
OverloadedException
© 2016 GridPoint, Inc. 26
Correlation IDs & version UUIDs
9/13/2016
uuid & timeuuid
• client-side timeuuid :
• server-side timeuuid :
incoming UUID
© 2016 GridPoint, Inc. 27
Reading in the course of the write path
9/13/2016
Execution sequence:
1. Read
2. Conditional write
© 2016 GridPoint, Inc. 28
Reading in the course of the write path:
AsyncFunction
9/13/2016
© 2016 GridPoint, Inc. 29
Reading in the course of the write path:
Other ideas
9/13/2016
• IF keyword: lightweight transactions, a.k.a. CAS (compare and set)
• caching
ALTER TABLE channel_interval_history WITH caching = { 'keys' :
'ALL', 'rows_per_partition' : '1' };
© 2016 GridPoint, Inc. 30
Bucketing & partitioning
9/13/2016
• partitioning of cumulative tables:
• definition of latest tables
channel_1_minute_latest_185
PRIMARY KEY ((channel_uuid, time_bucket), end_ts, version_uuid)
concept of buckets
© 2016 GridPoint, Inc. 31
Bucketing & partitioning (cont.)
9/13/2016
© 2016 GridPoint, Inc. 32
Bucketing & partitioning (cont.)
9/13/2016
© 2016 GridPoint, Inc. 33
Bulk inserts & batching
9/13/2016
illustrations from DataStax
© 2016 GridPoint, Inc. 34
Pushing the limits of batching
9/13/2016
© 2016 GridPoint, Inc. 35
Batching (& non-batching) semantics
9/13/2016
sorting before batching/inserting:
• cumulative tables: ascending
• latest tables: descending
the component-level logical write (the unit of work):
• single insert per table
• single batch per table
• multiple batches per table
© 2016 GridPoint, Inc. 36
Pushing the limits of batching, revisited
9/13/2016
right-sizing the component-level write unit of work:
• idempotency
• fail-fast failure handling
• making the caller retry
• "duplicate bulk import barrier“
© 2016 GridPoint, Inc. 37
"Pseudo-transactions"
9/13/2016
1. For batch 1, the async insert request(s) are sent out.
2. The callback(s) for batch 1's async requests are called.
3. For batch 2, the async insert request(s) are sent out.
4. The callback(s) for batch 2's async requests are called.
5. For batch 3, the async insert request(s) are sent out.
6. The callback(s) for batch 3's async requests are called.
special case of serialized execution rather than parallel execution
© 2016 GridPoint, Inc. 38
Metrics & monitoring
9/13/2016
• Session.getState
• com.datastax.driver.core.Metrics
• ingestion state & completion state:
Multi-measurement inserts performance metrics:
Took 51797 ms with the inserts themselves collectively taking 13443 ms.
Table variation | sum of durations | average duration | max durati
all-intervals cumulative | 4108 ms | 3 ms | 155 ms
all-intervals latest | 3656 ms | 3 ms | 284 ms
per-interval cumulative | 3177 ms | 3 ms | 211 ms
per-interval latest | 2501 ms | 2 ms | 84 ms
History table query and inserts, if any, took 1 ms.
Instantiation of MultiMeasurementChannelData object took 450 ms.
© 2016 GridPoint, Inc. 39
Auditing
9/13/2016
bulk-import tables:
• bulk_import_audit_sort_by_high_end_ts
• bulk_import_audit_sort_by_low_end_ts
• bulk_import_audit_exit
CREATE TABLE timeseries.bulk_import_audit_exit (
channel_uuid uuid,
root_version_uuid timeuuid,
duration_ms int,
exit_call_count int,
exit_ts timestamp,
failed boolean,
failed_reason text,
high_end_ts timestamp,
low_end_ts timestamp,
skipped boolean,
skipped_reason text,
PRIMARY KEY (channel_uuid, root_version_uuid)
) WITH CLUSTERING ORDER BY (root_version_uuid DESC);
auditing the live-feed writes
© 2016 GridPoint, Inc. 40
Component integration testing
9/13/2016
• target cluster:
− local or remote vanilla install
− Cassandra Cluster Manager (CCM)
− cassandra-unit
− container(s)
• automated tests pointed at a live cluster
• integration test contract:
− migrations vs. clearing the data
− smoke test replays
• what we’re not testing
• foundation for improvements
© 2016 GridPoint, Inc. 41
Materialized views in Cassandra 3.0
9/13/2016
Cassandra 3.0 = DataStax Enterprise 5.0
www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
© 2016 GridPoint, Inc. 42
Higher-level write abstractions
9/13/2016
DataStax Spark Cassandra Connector: spark-cassandra-connector
• github.com/datastax/spark-cassandra-connector
• saveToCassandra
• spark-cassandra-connector-java
© 2016 GridPoint, Inc. 43
Higher-level write abstractions
9/13/2016
phantom
• github.com/outworkers/phantom
CAUTION
© 2016 GridPoint, Inc. 44
Lower-level improvements
9/13/2016
Guava ListenableFuture -> Java 8 CompletableFuture
• why?
• how?make custom CompletableFuture type do what FutureCallback
implementation would do:
• onSuccess -> complete
• onFailure -> completeExceptionally
• instantiate the custom CompletableFuture, and then:
© 2016 GridPoint, Inc. 45
Lower-level improvements
9/13/2016
Guava ListenableFuture -> Scala Future
• why?
• how?
© 2016 GridPoint, Inc. 46
Guava ListenableFuture -> Scala Future
9/13/2016
how--take 2
© 2016 GridPoint, Inc. 47
Lower-level improvements
9/13/2016
Reactive Streams integration
• www.reactive-streams.org/
• www.reactivemanifesto.org/
• implementations: RxJava, Akka Streams, Vert.x, Flow class in
Java 9 (JEP-266)
streaming consumer contract
© 2016 GridPoint, Inc. 48
Lower-level improvements
9/13/2016
Akka integration
© 2016 GridPoint, Inc. 49
No compromises?
9/13/2016
= compromises
= compromisescompromises
• maximize parallelism …
lightweight transactions, materialized views, serialization
• avoiding overloading the cluster
• an evolutionary component through integration testing
© 2016 GridPoint, Inc. 50
9/13/2016
Thank you!
Mitch Gitman
 mgitman@gridpoint.com
 mgitman@nilistics.net
 mgitman@gmail.com
 skeletal presence @ LinkedIn

More Related Content

Recently uploaded

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 

Recently uploaded (20)

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Getting to a no-compromises Cassandra write path

  • 1. © 2016 GridPoint, Inc. 1 Getting to a no-compromises Cassandra write path Mitch Gitman senior software engineer GridPoint, Inc.
  • 2. © 2016 GridPoint, Inc. 2 9/13/2016
  • 3. © 2016 GridPoint, Inc. 3 9/13/2016
  • 4. © 2016 GridPoint, Inc. 4 Format 9/13/2016
  • 5. © 2016 GridPoint, Inc. 5 Our timeseries schema 9/13/2016 • highly denormalized − cumulative tables − latest tables − ancillary tables: audit, secondary lookups
  • 6. © 2016 GridPoint, Inc. 6 DataStax Java Driver 9/13/2016 datastax.github.io/java-driver/ working directly and indirectly
  • 7. © 2016 GridPoint, Inc. 7 DataStax Java Driver Version 3.0 9/13/2016 • released in January • native protocol version 4 • custom codecs − cassandra-driver-extras • java.time.Instant <--> CQL timestamp • Java array <--> CQL list • Java enum <--> CQL varchar or int • ... and more • retry policy enhancements
  • 8. © 2016 GridPoint, Inc. 8 The anti-Cassandra Cassandra persistence approach 9/13/2016
  • 9. © 2016 GridPoint, Inc. 9 Going async 9/13/2016 • the Cassandra binary protocol--take advantage of it • Netty--take advantage of it • should the async API be applied more broadly? − datastax.github.io/java-driver/manual/pooling/ − 32,768 > 1 or 327,680 > 10
  • 10. © 2016 GridPoint, Inc. 10 Asynchronous queries 9/13/2016 • The denormalized write-path building blocks • datastax.github.io/java-driver/manual/async/ • DataStax Java Driver ResultSetFuture --> Guava ListenableFuture<ResultSet>
  • 11. © 2016 GridPoint, Inc. 11 Cassandra persistence: take 2 9/13/2016
  • 12. © 2016 GridPoint, Inc. 12 Corralling the asynchronous query results 9/13/2016 • Guava FutureCallback
  • 13. © 2016 GridPoint, Inc. 13 Corralling the asynchronous query results (cont.) 9/13/2016
  • 14. © 2016 GridPoint, Inc. 14 Corralling the async query results (cont.) 9/13/2016 • limited, isolated use of state − ingestion state − completion state
  • 15. © 2016 GridPoint, Inc. 15 The Cassandra write component 9/13/2016 four component-level semantic contracts: • fire and forget • block on callbacks • return a future • streaming consumer component-level logical write: • unit of work • idempotent all fronting an inherently asynchronous implementation
  • 16. © 2016 GridPoint, Inc. 16 What all these semantic contracts have in common 9/13/2016 • they all call: • single result: • success: all succeed • failure: any fail • metrics
  • 17. © 2016 GridPoint, Inc. 17 Fire and forget 9/13/2016 considerations: • what if the write fails? hallmarks: • provide a passive API to consumers • not holding up the main caller thread • caller has no direct feedback on the fate of the write so what is it good for? • logging-like data
  • 18. © 2016 GridPoint, Inc. 18 Block on callbacks 9/13/2016 considerations: • scalability limited by synchronous caller thread hallmarks: • provide a passive API to consumers • hold up the main caller thread for the callbacks to come back • timeouts affect the caller so what is it good for? • preserving an existing synchronous contract • batch-oriented bulk imports • integration testing
  • 19. © 2016 GridPoint, Inc. 19 Block on callbacks (cont.) 9/13/2016
  • 20. © 2016 GridPoint, Inc. 20 Return a future 9/13/2016 considerations: • opens up the ability to handle a greater workload • trust the caller hallmarks: • provide a passive API to consumers • not holding up the main caller thread • caller manages resolution of the write calls so what is it good for? • internally managing throttling • tapping into a streaming or workflow solution
  • 21. © 2016 GridPoint, Inc. 21 Streaming consumer 9/13/2016 considerations: • way to maximize throughput and scalability • making the write component a streaming consumer vs. letting the streaming toolkit write directly to Cassandra hallmarks: • the write component notifies upstream producers when it's (un)available • ability to elastically respond to load what is it good for? • natural fit for Kafka producer for near-real-time streaming applications • potential to plug into different streaming providers
  • 22. © 2016 GridPoint, Inc. 22 Choosing between the four component-level semantic contracts 9/13/2016 a multiple-choice question: • fire and forget • block on callbacks • return a future • streaming consumer questions: • what are our usage patterns? • what are our scalability requirements? • should we rethink our usage patterns? • what should we do sooner and what should we put off until later? back to the original question: which component-level semantic contract to choose?
  • 23. © 2016 GridPoint, Inc. 23 Avoiding overloading the cluster 9/13/2016 • the Cassandra client specifically: − queueing up write units of work • in general: − less sophisticated <--> more sophisticated − load testing <--> elastic (de)provisioning
  • 24. © 2016 GridPoint, Inc. 24 But what about throttling? 9/13/2016 How to avoid Thread.sleep without getting ahead of yourself considerations: • sstables catch-up • traffic datastax.github.io/java-driver/manual/pooling/ back pressure under load ingestion state & completion statenative_transport_max_concurrent_connections native_transport_max_concurrent_connections_per_ip
  • 25. © 2016 GridPoint, Inc. 25 RetryPolicy 9/13/2016 datastax.github.io/java-driver/manual/retries/ avoid a death spiral OverloadedException
  • 26. © 2016 GridPoint, Inc. 26 Correlation IDs & version UUIDs 9/13/2016 uuid & timeuuid • client-side timeuuid : • server-side timeuuid : incoming UUID
  • 27. © 2016 GridPoint, Inc. 27 Reading in the course of the write path 9/13/2016 Execution sequence: 1. Read 2. Conditional write
  • 28. © 2016 GridPoint, Inc. 28 Reading in the course of the write path: AsyncFunction 9/13/2016
  • 29. © 2016 GridPoint, Inc. 29 Reading in the course of the write path: Other ideas 9/13/2016 • IF keyword: lightweight transactions, a.k.a. CAS (compare and set) • caching ALTER TABLE channel_interval_history WITH caching = { 'keys' : 'ALL', 'rows_per_partition' : '1' };
  • 30. © 2016 GridPoint, Inc. 30 Bucketing & partitioning 9/13/2016 • partitioning of cumulative tables: • definition of latest tables channel_1_minute_latest_185 PRIMARY KEY ((channel_uuid, time_bucket), end_ts, version_uuid) concept of buckets
  • 31. © 2016 GridPoint, Inc. 31 Bucketing & partitioning (cont.) 9/13/2016
  • 32. © 2016 GridPoint, Inc. 32 Bucketing & partitioning (cont.) 9/13/2016
  • 33. © 2016 GridPoint, Inc. 33 Bulk inserts & batching 9/13/2016 illustrations from DataStax
  • 34. © 2016 GridPoint, Inc. 34 Pushing the limits of batching 9/13/2016
  • 35. © 2016 GridPoint, Inc. 35 Batching (& non-batching) semantics 9/13/2016 sorting before batching/inserting: • cumulative tables: ascending • latest tables: descending the component-level logical write (the unit of work): • single insert per table • single batch per table • multiple batches per table
  • 36. © 2016 GridPoint, Inc. 36 Pushing the limits of batching, revisited 9/13/2016 right-sizing the component-level write unit of work: • idempotency • fail-fast failure handling • making the caller retry • "duplicate bulk import barrier“
  • 37. © 2016 GridPoint, Inc. 37 "Pseudo-transactions" 9/13/2016 1. For batch 1, the async insert request(s) are sent out. 2. The callback(s) for batch 1's async requests are called. 3. For batch 2, the async insert request(s) are sent out. 4. The callback(s) for batch 2's async requests are called. 5. For batch 3, the async insert request(s) are sent out. 6. The callback(s) for batch 3's async requests are called. special case of serialized execution rather than parallel execution
  • 38. © 2016 GridPoint, Inc. 38 Metrics & monitoring 9/13/2016 • Session.getState • com.datastax.driver.core.Metrics • ingestion state & completion state: Multi-measurement inserts performance metrics: Took 51797 ms with the inserts themselves collectively taking 13443 ms. Table variation | sum of durations | average duration | max durati all-intervals cumulative | 4108 ms | 3 ms | 155 ms all-intervals latest | 3656 ms | 3 ms | 284 ms per-interval cumulative | 3177 ms | 3 ms | 211 ms per-interval latest | 2501 ms | 2 ms | 84 ms History table query and inserts, if any, took 1 ms. Instantiation of MultiMeasurementChannelData object took 450 ms.
  • 39. © 2016 GridPoint, Inc. 39 Auditing 9/13/2016 bulk-import tables: • bulk_import_audit_sort_by_high_end_ts • bulk_import_audit_sort_by_low_end_ts • bulk_import_audit_exit CREATE TABLE timeseries.bulk_import_audit_exit ( channel_uuid uuid, root_version_uuid timeuuid, duration_ms int, exit_call_count int, exit_ts timestamp, failed boolean, failed_reason text, high_end_ts timestamp, low_end_ts timestamp, skipped boolean, skipped_reason text, PRIMARY KEY (channel_uuid, root_version_uuid) ) WITH CLUSTERING ORDER BY (root_version_uuid DESC); auditing the live-feed writes
  • 40. © 2016 GridPoint, Inc. 40 Component integration testing 9/13/2016 • target cluster: − local or remote vanilla install − Cassandra Cluster Manager (CCM) − cassandra-unit − container(s) • automated tests pointed at a live cluster • integration test contract: − migrations vs. clearing the data − smoke test replays • what we’re not testing • foundation for improvements
  • 41. © 2016 GridPoint, Inc. 41 Materialized views in Cassandra 3.0 9/13/2016 Cassandra 3.0 = DataStax Enterprise 5.0 www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
  • 42. © 2016 GridPoint, Inc. 42 Higher-level write abstractions 9/13/2016 DataStax Spark Cassandra Connector: spark-cassandra-connector • github.com/datastax/spark-cassandra-connector • saveToCassandra • spark-cassandra-connector-java
  • 43. © 2016 GridPoint, Inc. 43 Higher-level write abstractions 9/13/2016 phantom • github.com/outworkers/phantom CAUTION
  • 44. © 2016 GridPoint, Inc. 44 Lower-level improvements 9/13/2016 Guava ListenableFuture -> Java 8 CompletableFuture • why? • how?make custom CompletableFuture type do what FutureCallback implementation would do: • onSuccess -> complete • onFailure -> completeExceptionally • instantiate the custom CompletableFuture, and then:
  • 45. © 2016 GridPoint, Inc. 45 Lower-level improvements 9/13/2016 Guava ListenableFuture -> Scala Future • why? • how?
  • 46. © 2016 GridPoint, Inc. 46 Guava ListenableFuture -> Scala Future 9/13/2016 how--take 2
  • 47. © 2016 GridPoint, Inc. 47 Lower-level improvements 9/13/2016 Reactive Streams integration • www.reactive-streams.org/ • www.reactivemanifesto.org/ • implementations: RxJava, Akka Streams, Vert.x, Flow class in Java 9 (JEP-266) streaming consumer contract
  • 48. © 2016 GridPoint, Inc. 48 Lower-level improvements 9/13/2016 Akka integration
  • 49. © 2016 GridPoint, Inc. 49 No compromises? 9/13/2016 = compromises = compromisescompromises • maximize parallelism … lightweight transactions, materialized views, serialization • avoiding overloading the cluster • an evolutionary component through integration testing
  • 50. © 2016 GridPoint, Inc. 50 9/13/2016 Thank you! Mitch Gitman  mgitman@gridpoint.com  mgitman@nilistics.net  mgitman@gmail.com  skeletal presence @ LinkedIn