SlideShare a Scribd company logo
1 of 13
Apache Cassandra:
building a production app on an
eventually-consistent DB
Oliver Lockwood
Prague, 20-21 October 2016
Agenda
• Brief introduction to Cassandra
• Gotchas when using an eventually-consistent DB
• Performing DB schema and data evolution in Cassandra for a production app
Oliver Lockwood Prague, 20-21 October 2016
Introduction to Cassandra
What it is, and what it’s good for
• NoSQL database
• Distributed architecture with no “master” – highly scalable and resilient
• Write-optimised
• Eventual consistency
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dbas-guide-to-nosql
Introduction to Cassandra
How storage, reads, writes and conflict resolution work
• Replication factor = how many copies
• Replication strategy determines
storage location
• Contact points used initially
• Client connection is to cluster
• Co-ordinator could be any node
(based on load balancing policy)
• Storage is independent of co-ordinator
• Last Write Wins for conflicts
Oliver Lockwood Prague, 20-21 October 2016
http://www.slideshare.net/DataStax/understanding-data-consistency-in-apache-cassandra
ClientClient
Client 2Client 2
Introduction to Cassandra
What it’s not good for
Oliver Lockwood Prague, 20-21 October 2016
http://planetcassandra.org/blog/flite-breaking-down-the-cql-where-clause/
Gotchas
Lessons we learned the hard way
• Distributable nature of Cassandra depends
on synchronized clocks
• What happens if clocks drift?
• INSERT, DELETE, READ from a single client.
• What if Node 3’s clock is slow?
Oliver Lockwood Prague, 20-21 October 2016
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
http://datascale.io/how-to-create-a-cassandra-cluster-in-aws/
ClientClient
(1) INSERT
(2) DELETE
Gotchas
Lessons we learned the hard way
Demo!
Oliver Lockwood Prague, 20-21 October 2016
http://stackoverflow.com/questions/17474830/configuring-cassandra-with-private-ip-for-internode-communications
https://github.com/oliverlockwood/aws-ansible-cassandra
Gotchas
Lessons we learned the hard way - resolution
• Node 3’s clock is slow
• Use client-side timestamps?
CQL protocol v3 supports this.
• Avoid time-sensitive query patterns
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3
ClientClient
(1) INSERT
(2) DELETE
Schema evolution in Cassandra
Introduction
• DB schemas evolve – accept it!
• Automation is better than manual processes
• For RDBMS: Flyway, Liquibase etc.
• For Cassandra…
… cqlmigrate!
Oliver Lockwood Prague, 20-21 October 2016
https://flywaydb.org/
http://www.liquibase.org/
Schema evolution in Cassandra
Introducing cqlmigrate
Oliver Lockwood Prague, 20-21 October 2016
https://github.com/sky-uk/cqlmigrate
http://developers.sky.com/internal/ovp/cassandra/schema/evolution/2016/07/05/cqlmigrate/
Schema evolution in Cassandra
Diving deeper into cqlmigrate
• Schema update operations are recorded, so each CQL file is applied only once
• Locking mechanism uses LWT to avoid race conditions
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf
Schema evolution in Cassandra
Diving deeper into cqlmigrate
Demo!
Oliver Lockwood Prague, 20-21 October 2016
https://github.com/oliverlockwood/cqlmigrate-example-app
In conclusion
Takeaway menu
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

More Related Content

What's hot

3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem振东 刘
 
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyRedis Labs
 
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...InfluxData
 
Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Lviv Startup Club
 
Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaTrieu Nguyen
 
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRed Hat Developers
 
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...HostedbyConfluent
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupMorgan Tocker
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 
Instaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationInstaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationBen Slater
 
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixGoing from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Gianluca Arbezzano
 
Data Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API callsData Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API callsAnant Corporation
 
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster ManagementApache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster ManagementAnant Corporation
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesContainer Solutions
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalizationShriya Arora
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
 

What's hot (20)

3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
 
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry Polyakovsky
 
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
 
Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"
 
Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, Kafka
 
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech TalkRESTEasy Reactive: Why should you care? | DevNation Tech Talk
RESTEasy Reactive: Why should you care? | DevNation Tech Talk
 
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Instaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationInstaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney Presentation
 
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixGoing from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017
 
Data Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API callsData Engineer's Lunch #46: Node.js and API calls
Data Engineer's Lunch #46: Node.js and API calls
 
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster ManagementApache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
 
NOSQL in the Cloud
NOSQL in the CloudNOSQL in the Cloud
NOSQL in the Cloud
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 

Similar to Apache Cassandra: building a production app on an eventually-consistent DB

It's in the cloud
It's in the cloudIt's in the cloud
It's in the cloudkenperkins
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...HostedbyConfluent
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraLohith Goudagere Nagaraj
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropboxconfluent
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesA Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesHostedbyConfluent
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Intro to OpenStack - WAJUG
Intro to OpenStack - WAJUGIntro to OpenStack - WAJUG
Intro to OpenStack - WAJUGKevin Jackson
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreGuido Schmutz
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)Arnaud Bouchez
 
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Matt Tesauro
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZconfluent
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...Amazon Web Services
 

Similar to Apache Cassandra: building a production app on an eventually-consistent DB (20)

It's in the cloud
It's in the cloudIt's in the cloud
It's in the cloud
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesA Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven Architectures
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Intro to OpenStack - WAJUG
Intro to OpenStack - WAJUGIntro to OpenStack - WAJUG
Intro to OpenStack - WAJUG
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)
 
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Apache Cassandra: building a production app on an eventually-consistent DB

  • 1. Apache Cassandra: building a production app on an eventually-consistent DB Oliver Lockwood Prague, 20-21 October 2016
  • 2. Agenda • Brief introduction to Cassandra • Gotchas when using an eventually-consistent DB • Performing DB schema and data evolution in Cassandra for a production app Oliver Lockwood Prague, 20-21 October 2016
  • 3. Introduction to Cassandra What it is, and what it’s good for • NoSQL database • Distributed architecture with no “master” – highly scalable and resilient • Write-optimised • Eventual consistency Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dbas-guide-to-nosql
  • 4. Introduction to Cassandra How storage, reads, writes and conflict resolution work • Replication factor = how many copies • Replication strategy determines storage location • Contact points used initially • Client connection is to cluster • Co-ordinator could be any node (based on load balancing policy) • Storage is independent of co-ordinator • Last Write Wins for conflicts Oliver Lockwood Prague, 20-21 October 2016 http://www.slideshare.net/DataStax/understanding-data-consistency-in-apache-cassandra ClientClient Client 2Client 2
  • 5. Introduction to Cassandra What it’s not good for Oliver Lockwood Prague, 20-21 October 2016 http://planetcassandra.org/blog/flite-breaking-down-the-cql-where-clause/
  • 6. Gotchas Lessons we learned the hard way • Distributable nature of Cassandra depends on synchronized clocks • What happens if clocks drift? • INSERT, DELETE, READ from a single client. • What if Node 3’s clock is slow? Oliver Lockwood Prague, 20-21 October 2016 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/ http://datascale.io/how-to-create-a-cassandra-cluster-in-aws/ ClientClient (1) INSERT (2) DELETE
  • 7. Gotchas Lessons we learned the hard way Demo! Oliver Lockwood Prague, 20-21 October 2016 http://stackoverflow.com/questions/17474830/configuring-cassandra-with-private-ip-for-internode-communications https://github.com/oliverlockwood/aws-ansible-cassandra
  • 8. Gotchas Lessons we learned the hard way - resolution • Node 3’s clock is slow • Use client-side timestamps? CQL protocol v3 supports this. • Avoid time-sensitive query patterns Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3 ClientClient (1) INSERT (2) DELETE
  • 9. Schema evolution in Cassandra Introduction • DB schemas evolve – accept it! • Automation is better than manual processes • For RDBMS: Flyway, Liquibase etc. • For Cassandra… … cqlmigrate! Oliver Lockwood Prague, 20-21 October 2016 https://flywaydb.org/ http://www.liquibase.org/
  • 10. Schema evolution in Cassandra Introducing cqlmigrate Oliver Lockwood Prague, 20-21 October 2016 https://github.com/sky-uk/cqlmigrate http://developers.sky.com/internal/ovp/cassandra/schema/evolution/2016/07/05/cqlmigrate/
  • 11. Schema evolution in Cassandra Diving deeper into cqlmigrate • Schema update operations are recorded, so each CQL file is applied only once • Locking mechanism uses LWT to avoid race conditions Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf
  • 12. Schema evolution in Cassandra Diving deeper into cqlmigrate Demo! Oliver Lockwood Prague, 20-21 October 2016 https://github.com/oliverlockwood/cqlmigrate-example-app
  • 13. In conclusion Takeaway menu Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

Editor's Notes

  1. *****Ask staff – can I invite any questions at the midpoint??***** Before we get started – show of hands: - how many people are familiar with Cassandra? - how many people are actively using Cassandra? 1) Get AWS console logged in 2) Mirror displays hotkey (Cmd-F1) 3) Warm up ansible cache
  2. Brief introduction to Cassandra – also covering what it’s good for, what it’s not good for and why Gotchas in an eventually-consistent DB – lessons we learned the hard way Performing DB schema and data evolution in Cassandra for a production app
  3. NoSQL – data modelled in a non relational manner! Eventual consistency – consistency is actually tunable for each type of operation, but stronger consistency levels impact performance.
  4. Contact points Show how different coordinator nodes work – completely separate from storage nodes for a given row Example of multiple consecutive updates to a particular row – explain Last Write Wins (LWW)
  5. What makes Cassandra so highly distributable also makes it vulnerable – the whole deployment must run on synchronized clocks. Clock drift can easily occur – even with NTP installed – and can expose problems. Let’s take the example of an INSERT, DELETE, READ query pattern from a single client. Although it’s not necessarily the most common pattern, intuitively you’d think it should work – after all, as we covered earlier, we’re in a “Last Write Wins” environment, and the operation order is clearly defined. Unfortunately, this is not necessarily the case. Let’s take a look at how this query pattern would progress.
  6. - Can everyone see? - Show AWS - Set clock back for single node - Show cluster state (explain nodetool if needed) - Show test - Run test i2cssh -m `ansible -vvvv -i ec2.py eu-west-1 --list-hosts | grep -v hosts | grep -v "config" | awk '{print $1}' | paste -s -d, -` -p LargeFont -b date +"%Y-%m-%d %H:%M:%S.%3N" Cmd-Alt-I for broadcast cmd-f1 for mirror display toggle! date +"%Y-%m-%d %H:%M:%S.%3N"; sudo date --set `date -d '-5 second' "+%H:%M:%S.%3N"`; date +"%Y-%m-%d %H:%M:%S.%3N" nodetool status curl http://169.254.169.254/latest/meta-data/public-ipv4 - Cassandra query tracing for details – can look at the `system_traces` keyspace Cmd + or Cmd – for font size in IntelliJ
  7. Version 3 of the native protocol (and the Java driver for the past couple of years!) supports allowing client-specified timestamps. Takeaways: Avoid time-sensitive query patterns! If a single client will be performing multiple consecutive Cassandra operations, use client-specified timestamps. PAUSE – any questions at this point before we move on?
  8. Now for a slight segue. -  sometimes we have to make changes to our schema (e.g. adding a new table) or provisioned data (as distinct from user-generated data) -  sometimes we have to spin up a new deployment from scratch (e.g. creating new data center / environment) - In both cases we need a reliable way to create and update our DB schema and data. I don’t know how you feel, but as a developer, I don’t like: Doing manual changes Having to ask Operations, DevOps or anyone else to make manual changes Complex application deployments – I want to install the new version of my app, and have it “just work”. If you’re using a relational DB, then there’s a number of tools that you can use to aid your schema evolution. You may have heard of Liquibase or Flyway (if you haven’t, then do look them up.) What about Cassandra? When the team responsible for user authentication and entitlements on Sky’s online video platform came to tackle this problem, there didn’t seem to be any such tooling available for Cassandra. So they created one, and called it cqlmigrate.
  9. To introduce cqlmigrate, let’s start with the concepts behind its founding. 1) Versioning the evolution of your schema into discrete steps.  Open-closed - don’t change past steps, but can add steps.  Fairly standard practice. 2) Including this evolution into the same VCS as your app itself - so that every version of the app has the full DB setup that’s needed for that version of the app to run. 3) Handling deltas (including full bootstrap if necessary!) as part of application startup, to minimise external dependencies. Although cqlmigrate can be run in a standalone manner, running it as part of app startup reduces the complexity of your application deployment, as no extra steps are necessary. We’ll take a look at a demo in a bit, but it’s really simple to invoke cqlmigrate – all you do is pass it a collection of java Paths containing the cql files which you want to run, and they are run in alphanumeric order.
  10. (Cassandra uses CQL, similar to SQL – Structured Query Language) ----- For each CQL file it applies, cqlmigrate creates a row in a “schema_updates” table in your keyspace, containing both the name of the file and the SHA1 checksum of it. If the row for a given CQL file already exists then cqlmigrate will skip applying it at runtime. It’s important to re-iterate how the open-closed principle applies here – if you change a previously-applied CQL file (even just changing whitespace!), it’ll get run again, which may cause problems. ------ On the one hand, you don’t want multiple nodes trying to change your DB schema at the same time – recipe for pain. I don’t want my schema evolution to have any dependency on how the application is started up. Cqlmigrate allows concurrent startup of multiple application nodes, by making use of a `locks` table. Cassandra’s lightweight transactions, based on the Paxos consensus algorithm, allow us to do an atomic test-and-set to ensure that only one instance can take the lock at a given time. The instance that first gets the lock will perform the schema evolution; all others will block until that’s complete, and then each in turn will get the lock, realise there’s nothing further to be done, and release the lock again.
  11. To demonstrate: date +"%Y-%m-%d %H:%M:%S.%3N"; sudo date --set `date -d '+5 second' "+%H:%M:%S.%3N"`; date +"%Y-%m-%d %H:%M:%S.%3N" - Tour of cqlmigrate-example-app - Simple DropWizard Application - Configuration including Cassandra stuff (show yaml) - MigrateSchemaBundle - Show Cassandra cluster (same one we had earlier?) - cqlsh in to it (explain cqlsh if necessary!) - ensure `example` keyspace is absent - show `cqlmigrate.locks` table - Start up application – show log lines detailing which scripts have been run - Rename “notyet” file, rebuild and rerun application - Demo what happens if it fails during cqlmigrate – interrupt and re-run If needed: - DELETE FROM cqlmigrate.locks WHERE name = 'example.schema_migration'
  12. As mentioned previously – time-sensitive query patterns should generally be avoided. If you have to have them in a single-client context, then specifying timestamps on the client side can help you get out of trouble. I’d really recommend trying out cqlmigrate for your schema evolution – and I’d also invite you to contribute to its development. It’s already in use by multiple production apps within Sky; we’ve made it open source and I hope that the broader development community – that’s you lot! - will find it useful and help it to grow. Answers: - No Joins etc - “not enough functionality in NoSQL world” - it’s against Cassandra principles to use JOINs - store the data in denormalised form; whatever form you’d want to query it? - Proved the theory by fixing the co-ordinator - not generally good practice for production, but useful for debugging. - Cassandra query tracing for details – can look at the `system_traces` keyspace - Cassandra versions 2.1.x and 3.0.x tested. Latter defaults to using client-side generation of timestamps. - Time-sensitive timestamps mainly in testing - verifying that records have been deleted - Standalone nature of cqlmigrate - how?