SlideShare a Scribd company logo
Apache Cassandra:
building a production app on an
eventually-consistent DB
Oliver Lockwood
Prague, 20-21 October 2016
Agenda
• Brief introduction to Cassandra
• Gotchas when using an eventually-consistent DB
• Performing DB schema and data evolution in Cassandra for a production app
Oliver Lockwood Prague, 20-21 October 2016
Introduction to Cassandra
What it is, and what it’s good for
• NoSQL database
• Distributed architecture with no “master” – highly scalable and resilient
• Write-optimised
• Eventual consistency
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dbas-guide-to-nosql
Introduction to Cassandra
How storage, reads, writes and conflict resolution work
• Replication factor = how many copies
• Replication strategy determines
storage location
• Contact points used initially
• Client connection is to cluster
• Co-ordinator could be any node
(based on load balancing policy)
• Storage is independent of co-ordinator
• Last Write Wins for conflicts
Oliver Lockwood Prague, 20-21 October 2016
http://www.slideshare.net/DataStax/understanding-data-consistency-in-apache-cassandra
ClientClient
Client 2Client 2
Introduction to Cassandra
What it’s not good for
Oliver Lockwood Prague, 20-21 October 2016
http://planetcassandra.org/blog/flite-breaking-down-the-cql-where-clause/
Gotchas
Lessons we learned the hard way
• Distributable nature of Cassandra depends
on synchronized clocks
• What happens if clocks drift?
• INSERT, DELETE, READ from a single client.
• What if Node 3’s clock is slow?
Oliver Lockwood Prague, 20-21 October 2016
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
http://datascale.io/how-to-create-a-cassandra-cluster-in-aws/
ClientClient
(1) INSERT
(2) DELETE
Gotchas
Lessons we learned the hard way
Demo!
Oliver Lockwood Prague, 20-21 October 2016
http://stackoverflow.com/questions/17474830/configuring-cassandra-with-private-ip-for-internode-communications
https://github.com/oliverlockwood/aws-ansible-cassandra
Gotchas
Lessons we learned the hard way - resolution
• Node 3’s clock is slow
• Use client-side timestamps?
CQL protocol v3 supports this.
• Avoid time-sensitive query patterns
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3
ClientClient
(1) INSERT
(2) DELETE
Schema evolution in Cassandra
Introduction
• DB schemas evolve – accept it!
• Automation is better than manual processes
• For RDBMS: Flyway, Liquibase etc.
• For Cassandra…
… cqlmigrate!
Oliver Lockwood Prague, 20-21 October 2016
https://flywaydb.org/
http://www.liquibase.org/
Schema evolution in Cassandra
Introducing cqlmigrate
Oliver Lockwood Prague, 20-21 October 2016
https://github.com/sky-uk/cqlmigrate
http://developers.sky.com/internal/ovp/cassandra/schema/evolution/2016/07/05/cqlmigrate/
Schema evolution in Cassandra
Diving deeper into cqlmigrate
• Schema update operations are recorded, so each CQL file is applied only once
• Locking mechanism uses LWT to avoid race conditions
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf
Schema evolution in Cassandra
Diving deeper into cqlmigrate
Demo!
Oliver Lockwood Prague, 20-21 October 2016
https://github.com/oliverlockwood/cqlmigrate-example-app
In conclusion
Takeaway menu
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

More Related Content

What's hot

3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
振东 刘
 
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry Polyakovsky
Redis Labs
 
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
InfluxData
 
Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"
Lviv Startup Club
 
Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, Kafka
Trieu Nguyen
 
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
Red Hat Developers
 
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
HostedbyConfluent
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
Morgan Tocker
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
Instaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationInstaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney Presentation
Ben Slater
 
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixGoing from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
HostedbyConfluent
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017
Gianluca Arbezzano
 
Data Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API callsData Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API calls
Anant Corporation
 
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster ManagementApache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Anant Corporation
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Container Solutions
 
NOSQL in the Cloud
NOSQL in the CloudNOSQL in the Cloud
NOSQL in the Cloud
Sergey Shishkin
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
Shriya Arora
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Adrianos Dadis
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
Morgan Tocker
 

What's hot (20)

3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
 
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry Polyakovsky
 
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
 
Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"
 
Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, Kafka
 
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
 
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Instaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationInstaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney Presentation
 
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixGoing from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017
 
Data Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API callsData Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API calls
 
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster ManagementApache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
 
NOSQL in the Cloud
NOSQL in the CloudNOSQL in the Cloud
NOSQL in the Cloud
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 

Similar to Apache Cassandra: building a production app on an eventually-consistent DB

It's in the cloud
It's in the cloudIt's in the cloud
It's in the cloud
kenperkins
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
HostedbyConfluent
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Lohith Goudagere Nagaraj
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
confluent
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
 
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesA Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven Architectures
HostedbyConfluent
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Intro to OpenStack - WAJUG
Intro to OpenStack - WAJUGIntro to OpenStack - WAJUG
Intro to OpenStack - WAJUG
Kevin Jackson
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
Álvaro Agea Herradón
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
nickmbailey
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)
Arnaud Bouchez
 
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
Matt Tesauro
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
Amazon Web Services
 

Similar to Apache Cassandra: building a production app on an eventually-consistent DB (20)

It's in the cloud
It's in the cloudIt's in the cloud
It's in the cloud
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesA Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven Architectures
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Intro to OpenStack - WAJUG
Intro to OpenStack - WAJUGIntro to OpenStack - WAJUG
Intro to OpenStack - WAJUG
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)
 
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

Apache Cassandra: building a production app on an eventually-consistent DB

  • 1. Apache Cassandra: building a production app on an eventually-consistent DB Oliver Lockwood Prague, 20-21 October 2016
  • 2. Agenda • Brief introduction to Cassandra • Gotchas when using an eventually-consistent DB • Performing DB schema and data evolution in Cassandra for a production app Oliver Lockwood Prague, 20-21 October 2016
  • 3. Introduction to Cassandra What it is, and what it’s good for • NoSQL database • Distributed architecture with no “master” – highly scalable and resilient • Write-optimised • Eventual consistency Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dbas-guide-to-nosql
  • 4. Introduction to Cassandra How storage, reads, writes and conflict resolution work • Replication factor = how many copies • Replication strategy determines storage location • Contact points used initially • Client connection is to cluster • Co-ordinator could be any node (based on load balancing policy) • Storage is independent of co-ordinator • Last Write Wins for conflicts Oliver Lockwood Prague, 20-21 October 2016 http://www.slideshare.net/DataStax/understanding-data-consistency-in-apache-cassandra ClientClient Client 2Client 2
  • 5. Introduction to Cassandra What it’s not good for Oliver Lockwood Prague, 20-21 October 2016 http://planetcassandra.org/blog/flite-breaking-down-the-cql-where-clause/
  • 6. Gotchas Lessons we learned the hard way • Distributable nature of Cassandra depends on synchronized clocks • What happens if clocks drift? • INSERT, DELETE, READ from a single client. • What if Node 3’s clock is slow? Oliver Lockwood Prague, 20-21 October 2016 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/ http://datascale.io/how-to-create-a-cassandra-cluster-in-aws/ ClientClient (1) INSERT (2) DELETE
  • 7. Gotchas Lessons we learned the hard way Demo! Oliver Lockwood Prague, 20-21 October 2016 http://stackoverflow.com/questions/17474830/configuring-cassandra-with-private-ip-for-internode-communications https://github.com/oliverlockwood/aws-ansible-cassandra
  • 8. Gotchas Lessons we learned the hard way - resolution • Node 3’s clock is slow • Use client-side timestamps? CQL protocol v3 supports this. • Avoid time-sensitive query patterns Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3 ClientClient (1) INSERT (2) DELETE
  • 9. Schema evolution in Cassandra Introduction • DB schemas evolve – accept it! • Automation is better than manual processes • For RDBMS: Flyway, Liquibase etc. • For Cassandra… … cqlmigrate! Oliver Lockwood Prague, 20-21 October 2016 https://flywaydb.org/ http://www.liquibase.org/
  • 10. Schema evolution in Cassandra Introducing cqlmigrate Oliver Lockwood Prague, 20-21 October 2016 https://github.com/sky-uk/cqlmigrate http://developers.sky.com/internal/ovp/cassandra/schema/evolution/2016/07/05/cqlmigrate/
  • 11. Schema evolution in Cassandra Diving deeper into cqlmigrate • Schema update operations are recorded, so each CQL file is applied only once • Locking mechanism uses LWT to avoid race conditions Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf
  • 12. Schema evolution in Cassandra Diving deeper into cqlmigrate Demo! Oliver Lockwood Prague, 20-21 October 2016 https://github.com/oliverlockwood/cqlmigrate-example-app
  • 13. In conclusion Takeaway menu Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

Editor's Notes

  1. *****Ask staff – can I invite any questions at the midpoint??***** Before we get started – show of hands: - how many people are familiar with Cassandra? - how many people are actively using Cassandra? 1) Get AWS console logged in 2) Mirror displays hotkey (Cmd-F1) 3) Warm up ansible cache
  2. Brief introduction to Cassandra – also covering what it’s good for, what it’s not good for and why Gotchas in an eventually-consistent DB – lessons we learned the hard way Performing DB schema and data evolution in Cassandra for a production app
  3. NoSQL – data modelled in a non relational manner! Eventual consistency – consistency is actually tunable for each type of operation, but stronger consistency levels impact performance.
  4. Contact points Show how different coordinator nodes work – completely separate from storage nodes for a given row Example of multiple consecutive updates to a particular row – explain Last Write Wins (LWW)
  5. What makes Cassandra so highly distributable also makes it vulnerable – the whole deployment must run on synchronized clocks. Clock drift can easily occur – even with NTP installed – and can expose problems. Let’s take the example of an INSERT, DELETE, READ query pattern from a single client. Although it’s not necessarily the most common pattern, intuitively you’d think it should work – after all, as we covered earlier, we’re in a “Last Write Wins” environment, and the operation order is clearly defined. Unfortunately, this is not necessarily the case. Let’s take a look at how this query pattern would progress.
  6. - Can everyone see? - Show AWS - Set clock back for single node - Show cluster state (explain nodetool if needed) - Show test - Run test i2cssh -m `ansible -vvvv -i ec2.py eu-west-1 --list-hosts | grep -v hosts | grep -v "config" | awk '{print $1}' | paste -s -d, -` -p LargeFont -b date +"%Y-%m-%d %H:%M:%S.%3N" Cmd-Alt-I for broadcast cmd-f1 for mirror display toggle! date +"%Y-%m-%d %H:%M:%S.%3N"; sudo date --set `date -d '-5 second' "+%H:%M:%S.%3N"`; date +"%Y-%m-%d %H:%M:%S.%3N" nodetool status curl http://169.254.169.254/latest/meta-data/public-ipv4 - Cassandra query tracing for details – can look at the `system_traces` keyspace Cmd + or Cmd – for font size in IntelliJ
  7. Version 3 of the native protocol (and the Java driver for the past couple of years!) supports allowing client-specified timestamps. Takeaways: Avoid time-sensitive query patterns! If a single client will be performing multiple consecutive Cassandra operations, use client-specified timestamps. PAUSE – any questions at this point before we move on?
  8. Now for a slight segue. -  sometimes we have to make changes to our schema (e.g. adding a new table) or provisioned data (as distinct from user-generated data) -  sometimes we have to spin up a new deployment from scratch (e.g. creating new data center / environment) - In both cases we need a reliable way to create and update our DB schema and data. I don’t know how you feel, but as a developer, I don’t like: Doing manual changes Having to ask Operations, DevOps or anyone else to make manual changes Complex application deployments – I want to install the new version of my app, and have it “just work”. If you’re using a relational DB, then there’s a number of tools that you can use to aid your schema evolution. You may have heard of Liquibase or Flyway (if you haven’t, then do look them up.) What about Cassandra? When the team responsible for user authentication and entitlements on Sky’s online video platform came to tackle this problem, there didn’t seem to be any such tooling available for Cassandra. So they created one, and called it cqlmigrate.
  9. To introduce cqlmigrate, let’s start with the concepts behind its founding. 1) Versioning the evolution of your schema into discrete steps.  Open-closed - don’t change past steps, but can add steps.  Fairly standard practice. 2) Including this evolution into the same VCS as your app itself - so that every version of the app has the full DB setup that’s needed for that version of the app to run. 3) Handling deltas (including full bootstrap if necessary!) as part of application startup, to minimise external dependencies. Although cqlmigrate can be run in a standalone manner, running it as part of app startup reduces the complexity of your application deployment, as no extra steps are necessary. We’ll take a look at a demo in a bit, but it’s really simple to invoke cqlmigrate – all you do is pass it a collection of java Paths containing the cql files which you want to run, and they are run in alphanumeric order.
  10. (Cassandra uses CQL, similar to SQL – Structured Query Language) ----- For each CQL file it applies, cqlmigrate creates a row in a “schema_updates” table in your keyspace, containing both the name of the file and the SHA1 checksum of it. If the row for a given CQL file already exists then cqlmigrate will skip applying it at runtime. It’s important to re-iterate how the open-closed principle applies here – if you change a previously-applied CQL file (even just changing whitespace!), it’ll get run again, which may cause problems. ------ On the one hand, you don’t want multiple nodes trying to change your DB schema at the same time – recipe for pain. I don’t want my schema evolution to have any dependency on how the application is started up. Cqlmigrate allows concurrent startup of multiple application nodes, by making use of a `locks` table. Cassandra’s lightweight transactions, based on the Paxos consensus algorithm, allow us to do an atomic test-and-set to ensure that only one instance can take the lock at a given time. The instance that first gets the lock will perform the schema evolution; all others will block until that’s complete, and then each in turn will get the lock, realise there’s nothing further to be done, and release the lock again.
  11. To demonstrate: date +"%Y-%m-%d %H:%M:%S.%3N"; sudo date --set `date -d '+5 second' "+%H:%M:%S.%3N"`; date +"%Y-%m-%d %H:%M:%S.%3N" - Tour of cqlmigrate-example-app - Simple DropWizard Application - Configuration including Cassandra stuff (show yaml) - MigrateSchemaBundle - Show Cassandra cluster (same one we had earlier?) - cqlsh in to it (explain cqlsh if necessary!) - ensure `example` keyspace is absent - show `cqlmigrate.locks` table - Start up application – show log lines detailing which scripts have been run - Rename “notyet” file, rebuild and rerun application - Demo what happens if it fails during cqlmigrate – interrupt and re-run If needed: - DELETE FROM cqlmigrate.locks WHERE name = 'example.schema_migration'
  12. As mentioned previously – time-sensitive query patterns should generally be avoided. If you have to have them in a single-client context, then specifying timestamps on the client side can help you get out of trouble. I’d really recommend trying out cqlmigrate for your schema evolution – and I’d also invite you to contribute to its development. It’s already in use by multiple production apps within Sky; we’ve made it open source and I hope that the broader development community – that’s you lot! - will find it useful and help it to grow. Answers: - No Joins etc - “not enough functionality in NoSQL world” - it’s against Cassandra principles to use JOINs - store the data in denormalised form; whatever form you’d want to query it? - Proved the theory by fixing the co-ordinator - not generally good practice for production, but useful for debugging. - Cassandra query tracing for details – can look at the `system_traces` keyspace - Cassandra versions 2.1.x and 3.0.x tested. Latter defaults to using client-side generation of timestamps. - Time-sensitive timestamps mainly in testing - verifying that records have been deleted - Standalone nature of cqlmigrate - how?