SlideShare a Scribd company logo
Monal Daxini @ monaldax
11/11/2019 ApacheCon, Las Vegas, 2019
https://www.linkedin.com/in/monaldaxini
Declarative Benchmarking of
Cassandra and It's Data Models
● Cloud Data Engineering @ Netflix, work on many data stores
● Help engineers build scalable solutions
● Built scalable data platforms using Apache Flink / Kafka / Docker
● Working with distributed systems for 18+ years
Profile
@monaldax
• 100’s of applications using Cassandra
• (several unique data models / config)
• 10’s of thousands instances
• 100’s of global C* clusters
• > 6 PB of data
• Millions of requests/ seconds
Netflix Cassandra Footprint
@monaldax
• Challenges developing a scalable data model (Cassandra)
• Declarative Cassandra benchmarking tool in action
• Tool’s philosophy, how it works, & how it can apply to other data stores
Structure Of The Talk
@monaldax
1. Design data model & schema
2. Design application queries
3. Identify application load & query
distribution
4. Prepare test data
5. Prepare query parameter values to
run queries efficiently
Developing a Scalable Cassandra Data Model
For each application:
6. Code an app to execute queries, and
instrument to capture metrics
7. Generate load against application to run
queries with desired distribution
8. Analyze results (build dashboard)
9. If results unsatisfactory, iterate from step 1
@monaldax
In addition,
We may need to test application workload on different
versions of Cassandra and or data models.
@monaldax
That’s a lot of steps, duplicate effort, and its cumbersome!
@monaldax
We want it to be easy, quick, and ergonomic!
1. Design data model & schema
2. Design application queries
3. Identify the application load & query
distribution
4. Prepare test data (generate)
9. Config tool, run test, if results
unsatisfactory, iterate from step 1
Developing a Scalable Cassandra Data Model
With tooling for each application:
5. Prepare query parameter values to run
queries efficiently
6. Code an app to execute queries, and
instrument to capture metrics
7. Generate load against application to run
queries with desired distribution
8. Analyze results (build dashboard)
Heavy Lifting in a Tool
@monaldax
● Generic benchmarking tool
● Support different data stores via plugin (available plugins)
● Dynamically tunable RPS and configuration
● Load patterns - random, time window, zipfian
What is NDBench?
@monaldax
NDBench In Action
NDBench NodeNDBench Node
(EC2 Instance)
NDBench Node
NDBench Node
(EC2 Instance)
Test
Cassandra Cluster
Schema & Test Data
reads / writes
Record Metrics
NDBench NodeNDBench APP UI
@monaldax
• Emulate application query logic runs against real or generated data
• Specify the traffic % distribution
• Basic data type coalescing for using query result in another query
• Run any CQL statement (Select, Update, Insert, Delete) & support all CQL types
• Support any Cassandra version with CQL support
Cassandra NDBench CQL plugin
@monaldax
• Validate scalability of data model and application query workload
• Compare the performance of data model for Cassandra version 3.x & 2.x
• Help certify Cassandra updates / upgrades - test different data models and
application workloads
• Use for data generation for given schema before running queries
What Do We Use It For / Plan To Use It For
@monaldax
Walkthrough of NDBench
CQL Plugin In Action
Steps 1-4, 9
@monaldax
Cassandra Schema Of Sample Application (step 1)
@monaldax
Application CQL Queries For API 1 (steps 2, 3)
Query Group 1: 70%
SELECT user_id, profile_id FROM user WHERE user_id = ?;
SELECT foreign_keys FROM user_index WHERE type =
'profile_id' AND value = ?;
@monaldax
Application CQL Queries For API 2 (steps 2, 3)
Query Group 2: 30%
SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?;
BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value)
VALUES (?, [ ?, ? ], ''profile_id'', ?); INSERT INTO user_index (create_time,
foreign_keys, type, value) VALUES (?, [ ? ], ''acc_guid'', ?); APPLY BATCH;
INSERT INTO map_test (id, uid_pid) VALUES (''1'', {user_id : ?, profile_id: ?});
INSERT INTO set_test(id, uid_pid) VALUES (''2'', {?});
@monaldax
NDBench CQL Plugin Overview
Test
Cassandra Cluster
Schema &
Test Data
Run Queries
ndb_perf_queries
Perf Test Profile
NDBench NodeNDBench NodeNDBench Node
With CQL Plugin
(EC2 Instance)
Record Metrics
NDBench NodeNDBench APP UI
@monaldax
NDBench CQL Plugin Perf-Test-Profile Schema (step 9)
@monaldax
var_* columns point to
different sources for
query parameter values.
Only one is used
ordered CQL in group (id)
Modified App Query With Parameter Reference - Group 1 (70%)
SELECT user_id, profile_id FROM user WHERE user_id = ?user_id?;
SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value
= ?profile_id?;
@monaldax
Modified App Query With Reference - 2 (30%)
SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?user_id?;
BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value)
VALUES (?:TS?, ?[user_id, profile_id]?, ''profile_id'', ?profile_id?); INSERT
INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?,
?[user_id]?, ''acc_guid'', ?acc_guid?); APPLY BATCH;
INSERT INTO map_test (id, uid_pid) VALUES (''1'', ?{user_id : user_id,
profile_id: profile_id}?);
INSERT INTO set_test(id, uid_pid) VALUES (''2'', ?s{user_id}s?);
Type Coercion
@monaldax
00:00
(mm: ss)
@monaldax
NDBench CQL Plugin Perf Test Profile - 2 Query Groups
@monaldax
NDBench CQL Plugin Perf Test Profile - Select source
@monaldax
NDBench CQL Plugin Perf Test Profile - Source Precedence
• Total traffic % of query groups must add up to 100
• Support different consistency level for each statement
• Columns in cql statement inferred, and available from the parameter source
• Parameter source - Table, Previous query results, SELECT statement
• Support large number of parameters to perf test CQL queries
Summary - Ergonomic Perf Test Profile, & Comprehensive Validation
@monaldax
Run Load Test
Spinnaker Pipeline
@monaldax
Run Load Test
Spinnaker Pipeline
@monaldax
Run Load Test
Spinnaker Pipeline
Manual Judgement
@monaldax
Test Specific Link
NDBenchUI-CQLPlugin
@monaldax
CassCQLPlugin
NDBenchUI-CQLPlugin
@monaldax
CassCQLPlugin
NDBenchUI-CQLPlugin
CassCQLPlugin
@monaldax
30:00
(mm: ss)
25 min perf test profile table entry, 5 min run test
@monaldax
Run Load Test
Spinnaker Pipeline
Manual Judgement
@monaldax
Test Specific Link
Dashboard
@monaldax
Dashboard - CQL Plugin Specific
@monaldax
Dashboard - Query Execution Latency Per Group
@monaldax
• Test scale up to 1.2 million ops / second (1.2 billion parameter rows)
• 96 nodes i3.8xl, LCS (compaction), LZ4, mostly read heavy
• Found data model bug, slowly leading to wide rows
• Client wrapper bugs - slow memory leak, metrics, prepared statement
caching not working
Testing C* Data Model For A Critical Service On 2.x & 3.x
@monaldax
We Would Like To Use Plugin To Test Cassandra @ Netflix
Use restores from prod data backups and define of
CQL Perf Test Profiles, exercised by the NDBench
CQL plugin, and triggered by Cassandra builds
@monaldax
Under The Hood Of
The CQL Plugin
@monaldax
NDBench CQL Plugin Architecture
Test
Cassandra Cluster
Schema &
Test Data
ndb_perf_queries
Run QueriesNDBench NodeNDBench Node
(EC2 Instance)
NDBench NodeNDBench Node
With CQL Plugin
(EC2 Instance)
Record Metrics
NDBench NodeNDBench APP UI
@monaldax
Perf Test Profile
@monaldax
NDBench NodeNDBench Node
Sqlite
Param store
Cassandra Cluster
ndb_perf_queries
Schema &
Test Data
Metadata could live on
any Cassandra cluster.
Parse metadata1
Load from user & Storeon node in Sqlite
2
Run queries with param values from Sqlite
& record metrics
4
NDBench UI
/init/
all nodes
0
REST
/start/ all nodes3
High-level Architecture
Randomize start
High-level Architecture (optimized)
@monaldax
NDBench NodeNDBench Node
Sqlite
Param store
Cassandra Cluster
Schema &
Test Data
Metadata could live on
any Cassandra cluster.
Parse metadata1
If ! user param on S3Load from & Store on1 node in Sqlite
2
Run queries with param values from Sqlite
& record metrics
7
Upload Sqllite file3
/init/ a node0
NDBench UI
/init/
all nodes
4
REST
/start/ all nodes6
Download Sqllite file
from each node
5
Randomize start
ndb_perf_queries
Dashboard - Parameters Values Uploaded and Shared
@monaldax
Lock-free Randomized Deterministic % Query Distribution On Each Node
Query Group ID 1: 70% Query Group ID 2: 30% ( 1 )
1 1 1 1 1 1 1 2 2 2 2
70 1s for Query Group 1 30 2s for Query Group 2
100 Element Array ↓
@monaldax
1 2 1 1 2 1 2 1 2 1 1
1 time Fisher-Yates Shuffle
Lock-free Randomized Deterministic % Query Distribution On Each Node
Query Group ID 1: 70% Query Group ID 2: 30% ( 2 )
@monaldax
1 2 1 1 2 1 2 1 2 1 1
Lock-free Randomized Deterministic % Query Distribution On Each Node
Query Group ID 1: 70% Query Group ID 2: 30% ( 3 )
Thread 1
︴ThreadLocal
Array Index
Thread n
︴ThreadLocal
Array Index
@monaldax
Data Generators And Generating Test Data
• ?:TS? - This is replaced by a timestamp.
• Add more generators (future)
• generation of non-collection (bigint, text, uuid, etc.) and collection types
• Use generators in INSERT to generate data for new schema
@monaldax
Wrap Up
@monaldax
• Declaratively benchmarking significantly reduces overhead in iterating over
schema and Cassandra config to achieve scale
• Used to test and benchmark against curated data sets and perf-test-profiles
• Support all data types & LWT Support (beta)
• Randomized deterministic percentage distribution of queries
Summary
@monaldax
• Open source NDBench CQL plugin (WIP)
• Add more generators
• Load sharded query parameter data on each NDBench node
• UDT Support in dynamic collections
• Build support for other data stores - leverage same philosophy & reuse code
Future Enhancements (Lazily)
@monaldax
@monaldax
End of Season 1
Q & A
@monaldax

More Related Content

What's hot

Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
Lightbend
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Jamie Grier
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Lightbend
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
confluent
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
Databricks
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Michael Noll
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
Huafeng Wang
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
Robert Metzger
 
Developing Secure Scala Applications With Fortify For Scala
Developing Secure Scala Applications With Fortify For ScalaDeveloping Secure Scala Applications With Fortify For Scala
Developing Secure Scala Applications With Fortify For Scala
Lightbend
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
Joan Viladrosa Riera
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
Streamlio
 

What's hot (20)

Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
UDF/UDAF: the extensibility framework for KSQL (Hojjat Jafapour, Confluent) K...
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
Developing Secure Scala Applications With Fortify For Scala
Developing Secure Scala Applications With Fortify For ScalaDeveloping Secure Scala Applications With Fortify For Scala
Developing Secure Scala Applications With Fortify For Scala
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 

Similar to Declarative benchmarking of cassandra and it's data models

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
Cliff Gilmore
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
NextGenML
NextGenML NextGenML
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
Teamstudio
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
Databricks
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
Malam Team
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
NoSQLmatters
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
Knoldus Inc.
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
Florent Ramiere
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
aragozin
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applications
Anastasiοs Antoniadis
 

Similar to Declarative benchmarking of cassandra and it's data models (20)

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
NextGenML
NextGenML NextGenML
NextGenML
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applications
 

More from Monal Daxini

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
Monal Daxini
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Monal Daxini
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
Monal Daxini
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
Monal Daxini
 

More from Monal Daxini (11)

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 

Recently uploaded

Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 

Recently uploaded (20)

Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 

Declarative benchmarking of cassandra and it's data models

  • 1. Monal Daxini @ monaldax 11/11/2019 ApacheCon, Las Vegas, 2019 https://www.linkedin.com/in/monaldaxini Declarative Benchmarking of Cassandra and It's Data Models
  • 2. ● Cloud Data Engineering @ Netflix, work on many data stores ● Help engineers build scalable solutions ● Built scalable data platforms using Apache Flink / Kafka / Docker ● Working with distributed systems for 18+ years Profile @monaldax
  • 3. • 100’s of applications using Cassandra • (several unique data models / config) • 10’s of thousands instances • 100’s of global C* clusters • > 6 PB of data • Millions of requests/ seconds Netflix Cassandra Footprint @monaldax
  • 4. • Challenges developing a scalable data model (Cassandra) • Declarative Cassandra benchmarking tool in action • Tool’s philosophy, how it works, & how it can apply to other data stores Structure Of The Talk @monaldax
  • 5. 1. Design data model & schema 2. Design application queries 3. Identify application load & query distribution 4. Prepare test data 5. Prepare query parameter values to run queries efficiently Developing a Scalable Cassandra Data Model For each application: 6. Code an app to execute queries, and instrument to capture metrics 7. Generate load against application to run queries with desired distribution 8. Analyze results (build dashboard) 9. If results unsatisfactory, iterate from step 1 @monaldax
  • 6. In addition, We may need to test application workload on different versions of Cassandra and or data models. @monaldax
  • 7. That’s a lot of steps, duplicate effort, and its cumbersome! @monaldax We want it to be easy, quick, and ergonomic!
  • 8. 1. Design data model & schema 2. Design application queries 3. Identify the application load & query distribution 4. Prepare test data (generate) 9. Config tool, run test, if results unsatisfactory, iterate from step 1 Developing a Scalable Cassandra Data Model With tooling for each application: 5. Prepare query parameter values to run queries efficiently 6. Code an app to execute queries, and instrument to capture metrics 7. Generate load against application to run queries with desired distribution 8. Analyze results (build dashboard) Heavy Lifting in a Tool @monaldax
  • 9. ● Generic benchmarking tool ● Support different data stores via plugin (available plugins) ● Dynamically tunable RPS and configuration ● Load patterns - random, time window, zipfian What is NDBench? @monaldax
  • 10. NDBench In Action NDBench NodeNDBench Node (EC2 Instance) NDBench Node NDBench Node (EC2 Instance) Test Cassandra Cluster Schema & Test Data reads / writes Record Metrics NDBench NodeNDBench APP UI @monaldax
  • 11. • Emulate application query logic runs against real or generated data • Specify the traffic % distribution • Basic data type coalescing for using query result in another query • Run any CQL statement (Select, Update, Insert, Delete) & support all CQL types • Support any Cassandra version with CQL support Cassandra NDBench CQL plugin @monaldax
  • 12. • Validate scalability of data model and application query workload • Compare the performance of data model for Cassandra version 3.x & 2.x • Help certify Cassandra updates / upgrades - test different data models and application workloads • Use for data generation for given schema before running queries What Do We Use It For / Plan To Use It For @monaldax
  • 13. Walkthrough of NDBench CQL Plugin In Action Steps 1-4, 9 @monaldax
  • 14. Cassandra Schema Of Sample Application (step 1) @monaldax
  • 15. Application CQL Queries For API 1 (steps 2, 3) Query Group 1: 70% SELECT user_id, profile_id FROM user WHERE user_id = ?; SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value = ?; @monaldax
  • 16. Application CQL Queries For API 2 (steps 2, 3) Query Group 2: 30% SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?; BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?, [ ?, ? ], ''profile_id'', ?); INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?, [ ? ], ''acc_guid'', ?); APPLY BATCH; INSERT INTO map_test (id, uid_pid) VALUES (''1'', {user_id : ?, profile_id: ?}); INSERT INTO set_test(id, uid_pid) VALUES (''2'', {?}); @monaldax
  • 17. NDBench CQL Plugin Overview Test Cassandra Cluster Schema & Test Data Run Queries ndb_perf_queries Perf Test Profile NDBench NodeNDBench NodeNDBench Node With CQL Plugin (EC2 Instance) Record Metrics NDBench NodeNDBench APP UI @monaldax
  • 18. NDBench CQL Plugin Perf-Test-Profile Schema (step 9) @monaldax var_* columns point to different sources for query parameter values. Only one is used ordered CQL in group (id)
  • 19. Modified App Query With Parameter Reference - Group 1 (70%) SELECT user_id, profile_id FROM user WHERE user_id = ?user_id?; SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value = ?profile_id?; @monaldax
  • 20. Modified App Query With Reference - 2 (30%) SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?user_id?; BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?, ?[user_id, profile_id]?, ''profile_id'', ?profile_id?); INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?, ?[user_id]?, ''acc_guid'', ?acc_guid?); APPLY BATCH; INSERT INTO map_test (id, uid_pid) VALUES (''1'', ?{user_id : user_id, profile_id: profile_id}?); INSERT INTO set_test(id, uid_pid) VALUES (''2'', ?s{user_id}s?); Type Coercion @monaldax
  • 22. NDBench CQL Plugin Perf Test Profile - 2 Query Groups @monaldax
  • 23. NDBench CQL Plugin Perf Test Profile - Select source @monaldax
  • 24. NDBench CQL Plugin Perf Test Profile - Source Precedence
  • 25. • Total traffic % of query groups must add up to 100 • Support different consistency level for each statement • Columns in cql statement inferred, and available from the parameter source • Parameter source - Table, Previous query results, SELECT statement • Support large number of parameters to perf test CQL queries Summary - Ergonomic Perf Test Profile, & Comprehensive Validation @monaldax
  • 26. Run Load Test Spinnaker Pipeline @monaldax
  • 27. Run Load Test Spinnaker Pipeline @monaldax
  • 28. Run Load Test Spinnaker Pipeline Manual Judgement @monaldax Test Specific Link
  • 32. 30:00 (mm: ss) 25 min perf test profile table entry, 5 min run test @monaldax
  • 33. Run Load Test Spinnaker Pipeline Manual Judgement @monaldax Test Specific Link
  • 35. Dashboard - CQL Plugin Specific @monaldax
  • 36. Dashboard - Query Execution Latency Per Group @monaldax
  • 37. • Test scale up to 1.2 million ops / second (1.2 billion parameter rows) • 96 nodes i3.8xl, LCS (compaction), LZ4, mostly read heavy • Found data model bug, slowly leading to wide rows • Client wrapper bugs - slow memory leak, metrics, prepared statement caching not working Testing C* Data Model For A Critical Service On 2.x & 3.x @monaldax
  • 38. We Would Like To Use Plugin To Test Cassandra @ Netflix Use restores from prod data backups and define of CQL Perf Test Profiles, exercised by the NDBench CQL plugin, and triggered by Cassandra builds @monaldax
  • 39. Under The Hood Of The CQL Plugin @monaldax
  • 40. NDBench CQL Plugin Architecture Test Cassandra Cluster Schema & Test Data ndb_perf_queries Run QueriesNDBench NodeNDBench Node (EC2 Instance) NDBench NodeNDBench Node With CQL Plugin (EC2 Instance) Record Metrics NDBench NodeNDBench APP UI @monaldax Perf Test Profile
  • 41. @monaldax NDBench NodeNDBench Node Sqlite Param store Cassandra Cluster ndb_perf_queries Schema & Test Data Metadata could live on any Cassandra cluster. Parse metadata1 Load from user & Storeon node in Sqlite 2 Run queries with param values from Sqlite & record metrics 4 NDBench UI /init/ all nodes 0 REST /start/ all nodes3 High-level Architecture Randomize start
  • 42. High-level Architecture (optimized) @monaldax NDBench NodeNDBench Node Sqlite Param store Cassandra Cluster Schema & Test Data Metadata could live on any Cassandra cluster. Parse metadata1 If ! user param on S3Load from & Store on1 node in Sqlite 2 Run queries with param values from Sqlite & record metrics 7 Upload Sqllite file3 /init/ a node0 NDBench UI /init/ all nodes 4 REST /start/ all nodes6 Download Sqllite file from each node 5 Randomize start ndb_perf_queries
  • 43. Dashboard - Parameters Values Uploaded and Shared @monaldax
  • 44. Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 1 ) 1 1 1 1 1 1 1 2 2 2 2 70 1s for Query Group 1 30 2s for Query Group 2 100 Element Array ↓ @monaldax
  • 45. 1 2 1 1 2 1 2 1 2 1 1 1 time Fisher-Yates Shuffle Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 2 ) @monaldax
  • 46. 1 2 1 1 2 1 2 1 2 1 1 Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 3 ) Thread 1 ︴ThreadLocal Array Index Thread n ︴ThreadLocal Array Index @monaldax
  • 47. Data Generators And Generating Test Data • ?:TS? - This is replaced by a timestamp. • Add more generators (future) • generation of non-collection (bigint, text, uuid, etc.) and collection types • Use generators in INSERT to generate data for new schema @monaldax
  • 49. • Declaratively benchmarking significantly reduces overhead in iterating over schema and Cassandra config to achieve scale • Used to test and benchmark against curated data sets and perf-test-profiles • Support all data types & LWT Support (beta) • Randomized deterministic percentage distribution of queries Summary @monaldax
  • 50. • Open source NDBench CQL plugin (WIP) • Add more generators • Load sharded query parameter data on each NDBench node • UDT Support in dynamic collections • Build support for other data stores - leverage same philosophy & reuse code Future Enhancements (Lazily) @monaldax
  • 51. @monaldax End of Season 1 Q & A @monaldax