Apache Cassandra: building a production app on an eventually-consistent DB

•Download as PPT, PDF•

2 likes•1,491 views

At Sky, we use Cassandra for database persistence in our Online Video Platform - the system which delivers all OTT video content to both Sky and NOW TV customers - and yes, that includes handling huge spikes in traffic both when there's a big Premier League football match and when a new Game of Thrones season comes online! This talk aims to cover the following topics. - A brief introduction to Cassandra, including what it’s good for, what it’s not good for, and why. We'll dig into how storage, reads, writes and conflict resolution work. - Gotchas in an eventually-consistent DB - some interesting problems we encountered and the lessons we learned the hard way. - Performing database schema and data evolution in Cassandra for a production app. - Why this is important, and what we did at Sky to ensure consistency of our database schema. Presented at Geecon Prague on 20th October 2016.

Technology

Apache Cassandra:
building a production app on an
eventually-consistent DB
Oliver Lockwood
Prague, 20-21 October 2016

Agenda
• Brief introduction to Cassandra
• Gotchas when using an eventually-consistent DB
• Performing DB schema and data evolution in Cassandra for a production app
Oliver Lockwood Prague, 20-21 October 2016

Introduction to Cassandra
What it is, and what it’s good for
• NoSQL database
• Distributed architecture with no “master” – highly scalable and resilient
• Write-optimised
• Eventual consistency
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dbas-guide-to-nosql

Introduction to Cassandra
How storage, reads, writes and conflict resolution work
• Replication factor = how many copies
• Replication strategy determines
storage location
• Contact points used initially
• Client connection is to cluster
• Co-ordinator could be any node
(based on load balancing policy)
• Storage is independent of co-ordinator
• Last Write Wins for conflicts
Oliver Lockwood Prague, 20-21 October 2016
http://www.slideshare.net/DataStax/understanding-data-consistency-in-apache-cassandra
ClientClient
Client 2Client 2

Introduction to Cassandra
What it’s not good for
Oliver Lockwood Prague, 20-21 October 2016
http://planetcassandra.org/blog/flite-breaking-down-the-cql-where-clause/

Gotchas
Lessons we learned the hard way
• Distributable nature of Cassandra depends
on synchronized clocks
• What happens if clocks drift?
• INSERT, DELETE, READ from a single client.
• What if Node 3’s clock is slow?
Oliver Lockwood Prague, 20-21 October 2016
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
http://datascale.io/how-to-create-a-cassandra-cluster-in-aws/
ClientClient
(1) INSERT
(2) DELETE

Gotchas
Lessons we learned the hard way
Demo!
Oliver Lockwood Prague, 20-21 October 2016
http://stackoverflow.com/questions/17474830/configuring-cassandra-with-private-ip-for-internode-communications
https://github.com/oliverlockwood/aws-ansible-cassandra

Gotchas
Lessons we learned the hard way - resolution
• Node 3’s clock is slow
• Use client-side timestamps?
CQL protocol v3 supports this.
• Avoid time-sensitive query patterns
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3
ClientClient
(1) INSERT
(2) DELETE

Schema evolution in Cassandra
Introduction
• DB schemas evolve – accept it!
• Automation is better than manual processes
• For RDBMS: Flyway, Liquibase etc.
• For Cassandra…
… cqlmigrate!
Oliver Lockwood Prague, 20-21 October 2016
https://flywaydb.org/
http://www.liquibase.org/

Schema evolution in Cassandra
Introducing cqlmigrate
Oliver Lockwood Prague, 20-21 October 2016
https://github.com/sky-uk/cqlmigrate
http://developers.sky.com/internal/ovp/cassandra/schema/evolution/2016/07/05/cqlmigrate/

Schema evolution in Cassandra
Diving deeper into cqlmigrate
• Schema update operations are recorded, so each CQL file is applied only once
• Locking mechanism uses LWT to avoid race conditions
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

Schema evolution in Cassandra
Diving deeper into cqlmigrate
Demo!
Oliver Lockwood Prague, 20-21 October 2016
https://github.com/oliverlockwood/cqlmigrate-example-app

In conclusion
Takeaway menu
Oliver Lockwood Prague, 20-21 October 2016
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

What's hot

3.1.Performance and BigData Ecosystem振东刘

Scaling Redis: Dmitry PolyakovskyRedis Labs

How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...InfluxData

Micheal Pershyn "Coljure 4 Big Data"Lviv Startup Club

Real time analytics with Netty, Storm, KafkaTrieu Nguyen

RESTEasy Reactive: Why should you care? | DevNation Tech TalkRed Hat Developers

Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...HostedbyConfluent

TiDB Introduction - Boston MySQL Meetup GroupMorgan Tocker

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp

Instaclustr Kafka Meetup Sydney PresentationBen Slater

Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent

Open Tracing, to order and understand your mess. - ApiConf 2017Gianluca Arbezzano

Data Engineer's Lunch #46: Node.js and API callsAnant Corporation

Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster ManagementAnant Corporation

Distributed Tracing with OpenTracing, ZipKin and KubernetesContainer Solutions

NOSQL in the CloudSergey Shishkin

Streaming datasets for personalizationShriya Arora

Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis

Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze

TiDB IntroductionMorgan Tocker

What's hot (20)

3.1.Performance and BigData Ecosystem

Scaling Redis: Dmitry Polyakovsky

How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...

Micheal Pershyn "Coljure 4 Big Data"

Real time analytics with Netty, Storm, Kafka

RESTEasy Reactive: Why should you care? | DevNation Tech Talk

Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...

TiDB Introduction - Boston MySQL Meetup Group

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story

Instaclustr Kafka Meetup Sydney Presentation

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Open Tracing, to order and understand your mess. - ApiConf 2017

Data Engineer's Lunch #46: Node.js and API calls

Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management

Distributed Tracing with OpenTracing, ZipKin and Kubernetes

NOSQL in the Cloud

Streaming datasets for personalization

Stream processing using Apache Storm - Big Data Meetup Athens 2016

Intro to open source observability with grafana, prometheus, loki, and tempo(...

TiDB Introduction

Similar to Apache Cassandra: building a production app on an eventually-consistent DB

It's in the cloudkenperkins

Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...HostedbyConfluent

Using the SDACK Architecture to Build a Big Data ProductEvans Ye

Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraLohith Goudagere Nagaraj

Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner

Apache Kafka® at Dropboxconfluent

Application Development with Apache Cassandra as a ServiceWSO2

A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesHostedbyConfluent

Apache Kafka - Scalable Message Processing and more!Guido Schmutz

Intro to OpenStack - WAJUGKevin Jackson

Stratio big data spainÁlvaro Agea Herradón

An efficient data mining solution by integrating Spark and CassandraStratio

Cassandra and Sparknickmbailey

Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreGuido Schmutz

Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller

High Performance Object Pascal Code on Servers (at EKON 22)Arnaud Bouchez

Testing at-cloud-speed sans-app-sec-austin-2013Matt Tesauro

BBL KAPPA Lesfurets.comCedric Vidal

All Streams Ahead! ksqlDB Workshop ANZconfluent

AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...Amazon Web Services

Similar to Apache Cassandra: building a production app on an eventually-consistent DB (20)

It's in the cloud

Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...

Using the SDACK Architecture to Build a Big Data Product

Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra

Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...

Apache Kafka® at Dropbox

Application Development with Apache Cassandra as a Service

A Practical Guide To End-to-End Tracing In Event Driven Architectures

Apache Kafka - Scalable Message Processing and more!

Intro to OpenStack - WAJUG

Stratio big data spain

An efficient data mining solution by integrating Spark and Cassandra

Cassandra and Spark

Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core

Highly available, scalable and secure data with Cassandra and DataStax Enterp...

High Performance Object Pascal Code on Servers (at EKON 22)

Testing at-cloud-speed sans-app-sec-austin-2013

BBL KAPPA Lesfurets.com

All Streams Ahead! ksqlDB Workshop ANZ

AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

From Family Reminiscence to Scholarly Archive .Alan Dix

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Take control of your SAP testing with UiPath Test SuiteDianaGray10

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Story boards and shot lists for my a level piececharlottematthew16

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

"ML in Production",Oleksandr BaganFwdays

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Vertex AI Gemini Prompt Engineering Tips

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Are Multi-Cloud and Serverless Good or Bad?

Designing IA for AI - Information Architecture Conference 2024

From Family Reminiscence to Scholarly Archive .

The Ultimate Guide to Choosing WordPress Pros and Cons

TeamStation AI System Report LATAM IT Salaries 2024

Human Factors of XR: Using Human Factors to Design XR Systems

Take control of your SAP testing with UiPath Test Suite

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Commit 2024 - Secret Management made easy

Story boards and shot lists for my a level piece

Anypoint Exchange: It’s Not Just a Repo!

Powerpoint exploring the locations used in television show Time Clash

"ML in Production",Oleksandr Bagan

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Apache Cassandra: building a production app on an eventually-consistent DB

1. Apache Cassandra: building a production app on an eventually-consistent DB Oliver Lockwood Prague, 20-21 October 2016

2. Agenda • Brief introduction to Cassandra • Gotchas when using an eventually-consistent DB • Performing DB schema and data evolution in Cassandra for a production app Oliver Lockwood Prague, 20-21 October 2016

3. Introduction to Cassandra What it is, and what it’s good for • NoSQL database • Distributed architecture with no “master” – highly scalable and resilient • Write-optimised • Eventual consistency Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dbas-guide-to-nosql

4. Introduction to Cassandra How storage, reads, writes and conflict resolution work • Replication factor = how many copies • Replication strategy determines storage location • Contact points used initially • Client connection is to cluster • Co-ordinator could be any node (based on load balancing policy) • Storage is independent of co-ordinator • Last Write Wins for conflicts Oliver Lockwood Prague, 20-21 October 2016 http://www.slideshare.net/DataStax/understanding-data-consistency-in-apache-cassandra ClientClient Client 2Client 2

5. Introduction to Cassandra What it’s not good for Oliver Lockwood Prague, 20-21 October 2016 http://planetcassandra.org/blog/flite-breaking-down-the-cql-where-clause/

6. Gotchas Lessons we learned the hard way • Distributable nature of Cassandra depends on synchronized clocks • What happens if clocks drift? • INSERT, DELETE, READ from a single client. • What if Node 3’s clock is slow? Oliver Lockwood Prague, 20-21 October 2016 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/ http://datascale.io/how-to-create-a-cassandra-cluster-in-aws/ ClientClient (1) INSERT (2) DELETE

7. Gotchas Lessons we learned the hard way Demo! Oliver Lockwood Prague, 20-21 October 2016 http://stackoverflow.com/questions/17474830/configuring-cassandra-with-private-ip-for-internode-communications https://github.com/oliverlockwood/aws-ansible-cassandra

8. Gotchas Lessons we learned the hard way - resolution • Node 3’s clock is slow • Use client-side timestamps? CQL protocol v3 supports this. • Avoid time-sensitive query patterns Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3 ClientClient (1) INSERT (2) DELETE

9. Schema evolution in Cassandra Introduction • DB schemas evolve – accept it! • Automation is better than manual processes • For RDBMS: Flyway, Liquibase etc. • For Cassandra… … cqlmigrate! Oliver Lockwood Prague, 20-21 October 2016 https://flywaydb.org/ http://www.liquibase.org/

10. Schema evolution in Cassandra Introducing cqlmigrate Oliver Lockwood Prague, 20-21 October 2016 https://github.com/sky-uk/cqlmigrate http://developers.sky.com/internal/ovp/cassandra/schema/evolution/2016/07/05/cqlmigrate/

11. Schema evolution in Cassandra Diving deeper into cqlmigrate • Schema update operations are recorded, so each CQL file is applied only once • Locking mechanism uses LWT to avoid race conditions Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

12. Schema evolution in Cassandra Diving deeper into cqlmigrate Demo! Oliver Lockwood Prague, 20-21 October 2016 https://github.com/oliverlockwood/cqlmigrate-example-app

13. In conclusion Takeaway menu Oliver Lockwood Prague, 20-21 October 2016 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

Editor's Notes

*****Ask staff – can I invite any questions at the midpoint??***** Before we get started – show of hands: - how many people are familiar with Cassandra? - how many people are actively using Cassandra? 1) Get AWS console logged in 2) Mirror displays hotkey (Cmd-F1) 3) Warm up ansible cache
Brief introduction to Cassandra – also covering what it’s good for, what it’s not good for and why Gotchas in an eventually-consistent DB – lessons we learned the hard way Performing DB schema and data evolution in Cassandra for a production app
NoSQL – data modelled in a non relational manner! Eventual consistency – consistency is actually tunable for each type of operation, but stronger consistency levels impact performance.
Contact points Show how different coordinator nodes work – completely separate from storage nodes for a given row Example of multiple consecutive updates to a particular row – explain Last Write Wins (LWW)
What makes Cassandra so highly distributable also makes it vulnerable – the whole deployment must run on synchronized clocks. Clock drift can easily occur – even with NTP installed – and can expose problems. Let’s take the example of an INSERT, DELETE, READ query pattern from a single client. Although it’s not necessarily the most common pattern, intuitively you’d think it should work – after all, as we covered earlier, we’re in a “Last Write Wins” environment, and the operation order is clearly defined. Unfortunately, this is not necessarily the case. Let’s take a look at how this query pattern would progress.
- Can everyone see? - Show AWS - Set clock back for single node - Show cluster state (explain nodetool if needed) - Show test - Run test i2cssh -m `ansible -vvvv -i ec2.py eu-west-1 --list-hosts | grep -v hosts | grep -v &quot;config&quot; | awk &apos;{print $1}&apos; | paste -s -d, -` -p LargeFont -b date +&quot;%Y-%m-%d %H:%M:%S.%3N&quot; Cmd-Alt-I for broadcast cmd-f1 for mirror display toggle! date +&quot;%Y-%m-%d %H:%M:%S.%3N&quot;; sudo date --set `date -d &apos;-5 second&apos; &quot;+%H:%M:%S.%3N&quot;`; date +&quot;%Y-%m-%d %H:%M:%S.%3N&quot; nodetool status curl http://169.254.169.254/latest/meta-data/public-ipv4 - Cassandra query tracing for details – can look at the `system_traces` keyspace Cmd + or Cmd – for font size in IntelliJ
Version 3 of the native protocol (and the Java driver for the past couple of years!) supports allowing client-specified timestamps. Takeaways: Avoid time-sensitive query patterns! If a single client will be performing multiple consecutive Cassandra operations, use client-specified timestamps. PAUSE – any questions at this point before we move on?
Now for a slight segue. - sometimes we have to make changes to our schema (e.g. adding a new table) or provisioned data (as distinct from user-generated data) - sometimes we have to spin up a new deployment from scratch (e.g. creating new data center / environment) - In both cases we need a reliable way to create and update our DB schema and data. I don’t know how you feel, but as a developer, I don’t like: Doing manual changes Having to ask Operations, DevOps or anyone else to make manual changes Complex application deployments – I want to install the new version of my app, and have it “just work”. If you’re using a relational DB, then there’s a number of tools that you can use to aid your schema evolution. You may have heard of Liquibase or Flyway (if you haven’t, then do look them up.) What about Cassandra? When the team responsible for user authentication and entitlements on Sky’s online video platform came to tackle this problem, there didn’t seem to be any such tooling available for Cassandra. So they created one, and called it cqlmigrate.
To introduce cqlmigrate, let’s start with the concepts behind its founding. 1) Versioning the evolution of your schema into discrete steps. Open-closed - don’t change past steps, but can add steps. Fairly standard practice. 2) Including this evolution into the same VCS as your app itself - so that every version of the app has the full DB setup that’s needed for that version of the app to run. 3) Handling deltas (including full bootstrap if necessary!) as part of application startup, to minimise external dependencies. Although cqlmigrate can be run in a standalone manner, running it as part of app startup reduces the complexity of your application deployment, as no extra steps are necessary. We’ll take a look at a demo in a bit, but it’s really simple to invoke cqlmigrate – all you do is pass it a collection of java Paths containing the cql files which you want to run, and they are run in alphanumeric order.
(Cassandra uses CQL, similar to SQL – Structured Query Language) ----- For each CQL file it applies, cqlmigrate creates a row in a “schema_updates” table in your keyspace, containing both the name of the file and the SHA1 checksum of it. If the row for a given CQL file already exists then cqlmigrate will skip applying it at runtime. It’s important to re-iterate how the open-closed principle applies here – if you change a previously-applied CQL file (even just changing whitespace!), it’ll get run again, which may cause problems. ------ On the one hand, you don’t want multiple nodes trying to change your DB schema at the same time – recipe for pain. I don’t want my schema evolution to have any dependency on how the application is started up. Cqlmigrate allows concurrent startup of multiple application nodes, by making use of a `locks` table. Cassandra’s lightweight transactions, based on the Paxos consensus algorithm, allow us to do an atomic test-and-set to ensure that only one instance can take the lock at a given time. The instance that first gets the lock will perform the schema evolution; all others will block until that’s complete, and then each in turn will get the lock, realise there’s nothing further to be done, and release the lock again.
To demonstrate: date +&quot;%Y-%m-%d %H:%M:%S.%3N&quot;; sudo date --set `date -d &apos;+5 second&apos; &quot;+%H:%M:%S.%3N&quot;`; date +&quot;%Y-%m-%d %H:%M:%S.%3N&quot; - Tour of cqlmigrate-example-app - Simple DropWizard Application - Configuration including Cassandra stuff (show yaml) - MigrateSchemaBundle - Show Cassandra cluster (same one we had earlier?) - cqlsh in to it (explain cqlsh if necessary!) - ensure `example` keyspace is absent - show `cqlmigrate.locks` table - Start up application – show log lines detailing which scripts have been run - Rename “notyet” file, rebuild and rerun application - Demo what happens if it fails during cqlmigrate – interrupt and re-run If needed: - DELETE FROM cqlmigrate.locks WHERE name = &apos;example.schema_migration&apos;
As mentioned previously – time-sensitive query patterns should generally be avoided. If you have to have them in a single-client context, then specifying timestamps on the client side can help you get out of trouble. I’d really recommend trying out cqlmigrate for your schema evolution – and I’d also invite you to contribute to its development. It’s already in use by multiple production apps within Sky; we’ve made it open source and I hope that the broader development community – that’s you lot! - will find it useful and help it to grow. Answers: - No Joins etc - “not enough functionality in NoSQL world” - it’s against Cassandra principles to use JOINs - store the data in denormalised form; whatever form you’d want to query it? - Proved the theory by fixing the co-ordinator - not generally good practice for production, but useful for debugging. - Cassandra query tracing for details – can look at the `system_traces` keyspace - Cassandra versions 2.1.x and 3.0.x tested. Latter defaults to using client-side generation of timestamps. - Time-sensitive timestamps mainly in testing - verifying that records have been deleted - Standalone nature of cqlmigrate - how?

Apache Cassandra: building a production app on an eventually-consistent DB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Cassandra: building a production app on an eventually-consistent DB

Similar to Apache Cassandra: building a production app on an eventually-consistent DB (20)

Recently uploaded

Recently uploaded (20)

Apache Cassandra: building a production app on an eventually-consistent DB

Editor's Notes