This document summarizes a presentation about Cassandra's highly available distributed data model. The presentation covers Cassandra's key capabilities of scalability, fault tolerance, tunable consistency, and replication without single points of failure. It discusses Cassandra's use of consistent hashing to partition and place data across nodes, as well as its replication strategies and consistency levels that allow tuning availability versus consistency.
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
Malcolm Box discusses Tellybug's experience using Cassandra to power voting applications for reality TV shows like Britain's Got Talent and The X Factor. They started with Cassandra to handle high write loads from millions of votes but found counting to be more challenging than expected. They implemented sharded counters in Memcached with Cassandra as the source of truth. While Cassandra scaled well for writes, reads had performance issues. Backup and data integrity also presented operational challenges as their usage of Cassandra evolved.
This document provides an overview of NoSQL databases and Cassandra in particular. It discusses that NoSQL databases were developed to address the inability of relational databases to scale horizontally to large datasets and distributed architectures. Cassandra is an open source, column-oriented NoSQL database that provides high availability and eventual consistency without ACID transactions through its implementation of the CAP theorem and Dynamo paper concepts. The document outlines Cassandra's data model, APIs, and Hector client library and provides code examples for common operations.
Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It aims for high availability with no single points of failure. Data is distributed across nodes and replicated for redundancy. Cassandra uses a decentralized design with peer-to-peer communication and an eventually consistent model. It requires denormalized data models and queries to be defined prior to data structure.
Cassandra is an open-source, distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability and performance as nodes are added, and transparent elasticity allowing addition or removal of nodes without downtime. Data is partitioned and replicated across nodes using consistent hashing to balance loads and ensure availability in the event of failures. The write path sequentially appends data to commit logs and memtables which are periodically flushed to disk as SSTables, while the read path retrieves data from memtables and SSTables in parallel across replicas.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
This document provides an overview and introduction to Apache Cassandra, including:
- Cassandra is a distributed database designed to handle large amounts of structured data across commodity servers. It provides high availability with no single point of failure and linear scalability.
- Cassandra uses a ring topology and consistent hashing to distribute data evenly across nodes. Data is stored in tables with rows mapped to partitions that are replicated across the ring for fault tolerance.
- The write path involves writing to memtables, committing to the commit log for durability, flushing memtables to SSTables, and periodic compaction. The read path merges data from memtables, SSTables, commit log and caches for fast retrieval.
Cassandra is a distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single points of failure and linear scalability as nodes are added. Cassandra uses a peer-to-peer distributed architecture and tunable consistency levels to achieve high performance and availability without requiring strong consistency. It is based on Amazon's Dynamo and Google's Bigtable papers and provides a combination of their features.
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
Malcolm Box discusses Tellybug's experience using Cassandra to power voting applications for reality TV shows like Britain's Got Talent and The X Factor. They started with Cassandra to handle high write loads from millions of votes but found counting to be more challenging than expected. They implemented sharded counters in Memcached with Cassandra as the source of truth. While Cassandra scaled well for writes, reads had performance issues. Backup and data integrity also presented operational challenges as their usage of Cassandra evolved.
This document provides an overview of NoSQL databases and Cassandra in particular. It discusses that NoSQL databases were developed to address the inability of relational databases to scale horizontally to large datasets and distributed architectures. Cassandra is an open source, column-oriented NoSQL database that provides high availability and eventual consistency without ACID transactions through its implementation of the CAP theorem and Dynamo paper concepts. The document outlines Cassandra's data model, APIs, and Hector client library and provides code examples for common operations.
Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It aims for high availability with no single points of failure. Data is distributed across nodes and replicated for redundancy. Cassandra uses a decentralized design with peer-to-peer communication and an eventually consistent model. It requires denormalized data models and queries to be defined prior to data structure.
Cassandra is an open-source, distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability and performance as nodes are added, and transparent elasticity allowing addition or removal of nodes without downtime. Data is partitioned and replicated across nodes using consistent hashing to balance loads and ensure availability in the event of failures. The write path sequentially appends data to commit logs and memtables which are periodically flushed to disk as SSTables, while the read path retrieves data from memtables and SSTables in parallel across replicas.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
This document provides an overview and introduction to Apache Cassandra, including:
- Cassandra is a distributed database designed to handle large amounts of structured data across commodity servers. It provides high availability with no single point of failure and linear scalability.
- Cassandra uses a ring topology and consistent hashing to distribute data evenly across nodes. Data is stored in tables with rows mapped to partitions that are replicated across the ring for fault tolerance.
- The write path involves writing to memtables, committing to the commit log for durability, flushing memtables to SSTables, and periodic compaction. The read path merges data from memtables, SSTables, commit log and caches for fast retrieval.
Cassandra is a distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single points of failure and linear scalability as nodes are added. Cassandra uses a peer-to-peer distributed architecture and tunable consistency levels to achieve high performance and availability without requiring strong consistency. It is based on Amazon's Dynamo and Google's Bigtable papers and provides a combination of their features.
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
This document provides an overview of Cassandra, a decentralized, distributed database management system. It discusses why the author's company chose Cassandra over other options like HBase and MySQL for their real-time data needs. The document then covers Cassandra's data model, architecture, data partitioning, replication, and other key aspects like writes, reads, deletes, and compaction. It also notes some limitations of Cassandra and provides additional resource links.
C* Summit 2013: (Re)-Building the Social Grid for Global Telcos @ 1/10th the ...DataStax Academy
Darshan Rawal is leading the development of hybrid cloud based messaging products for global Tier 1 Telcos. Darshan has been working in Silicon valley since 2000, building nimble, cost effective products/services, handling millions of users and billions of transactions per day. Previous to Openwave Messaging, Darshan held engineering positions @ SS8 networks, Yahoo, DE Shaw, yp.com and has a M.S in Software Engineering from Carnegie Mellon University.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
This talk was given at Cassandra London meetup: https://www.meetup.com/Cassandra-London/events/267271963/ . The talk is about orchestration of Cassandra with our Kubernetes Operator and Yelp PaaSTA. We also outline some of the opportunities and challenges associated with this architecture.
Youtube link: https://www.youtube.com/watch?v=JqAILFkkibA
A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersAnant Corporation
In Cassandra Lunch #64: Cassandra for .NET Developers, Co-founder, Customer Experience Architect, and Sitecore MVP of Anant, Eric Ramseur will be presenting on Cassandra for .NET developers.
Accompanying Blog: Coming Soon!
Accompanying YouTube: https://youtu.be/9DwnDGak6Yo
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Drivers connect applications to Cassandra clusters and maintain connections to nodes. They probe clusters to discover nodes, token ranges, and latency. Drivers are data-aware and can route queries to appropriate replicas or fail over if needed. Cassandra clusters can span multiple data centers for redundancy, workload separation, and geographic distribution of data and queries. Configuration files like cassandra.yaml and cassandra-env.sh are used to configure memory, data storage, caching, and other settings. Cassandra clusters should be provisioned on commodity servers using tools like cassandra-stress to test workloads and estimate needed nodes.
Saratov open it teach talk.
Дамир Яраев:
Введение в Apache Cassandra (В ходе презентации Дамир расскажет, когда и почему стоит переходить с проверенных временем реляционных баз данных на ставшие модными в последнее время решения на базе NoSQL. В качестве примера рассмотрит колоночную NoSQL базу данных Apache Cassandra)
Cassandra concepts, patterns and anti-patternsDave Gardner
The document discusses Cassandra concepts, patterns, and anti-patterns. It begins with an agenda that covers choosing NoSQL, Cassandra concepts based on Dynamo and Bigtable, and patterns and anti-patterns of use. It then delves into Cassandra concepts such as consistent hashing, vector clocks, gossip protocol, hinted handoff, read repair, and consistency levels. It also discusses Bigtable concepts like sparse column-based data model, SSTables, commit log, and memtables. Finally, it outlines several patterns and anti-patterns of Cassandra use.
This talk is about orchestration of Cassandra on Kubernetes with Cassandra Operator and Yelp's Platform-as-a-Service: PaaSTA. The talk focusses specifically on the internals of cassandra operator and its core reconcile loop for reconciliation of cluster state and on-disk configuration.
Casandra is a open-source, distributed, highly scalable and fault-tolerant database. It is a best choice for managing structured, semi-structured or unstructured data at a large amount.
We provide an overview of the expressive object model, secondary indexes, high availability, write scalability, query language support, performance benchmarks - database model, performance benchmarks - load characteristics, performance benchmarks - consistency requirements, ease of use, and navigation aggregation.
This document discusses efficient data mining solutions using Hadoop, Cassandra, and Spark. It describes Cassandra as a fast, robust, and efficient key-value database but notes it has limitations for certain queries. Spark is presented as an alternative to Hadoop MapReduce that can be 100 times faster for interactive algorithms and data mining. The document demonstrates how Spark can integrate with Cassandra to allow distributed data processing over Cassandra data without needing to clone the data or use other databases. Future extensions are proposed to directly access Cassandra's SSTable files from Spark and extend CQL3 to leverage Spark.
1) Apache Cassandra in term of CAP Theorem
2) What makes Apache Cassandra "Available"?
3) How Apache Cassandra ensures data consistency?
4) Cassandra advantages and disadvantages
5) Frameworks/libraries to access Apache Cassandra + performance comparison
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. In this talk I will introduce you to a powerhouse combination of Cassandra and Spark, which provides a high-speed platform for both real-time and batch analysis.
This document provides an overview of Apache Cassandra and how to deploy it using Docker. It begins with an introduction to Docker fundamentals and the CAP theorem. It then covers Cassandra fundamentals, including its history, users, benefits like scalability and consistency options, and challenges. The document demonstrates writes and reads in Cassandra with different consistency levels. It also discusses data replication strategies and calculating consistency. Finally, it previews a demo of deploying Cassandra with Docker and lists additional resources.
How Optimizely (Safely) Maximizes Database Concurrency.pdfScyllaDB
Having a database that’s capable of high concurrency is one thing, but actually tapping all that potential concurrency is another. Fortunately, Optimizely Engineering has developed practical strategies that can help other teams.
Learn how Optimizely Engineering takes full advantage of the high concurrency that’s possible with their NoSQL database, ScyllaDB – while also guaranteeing correctness and protecting the quality of service. Brian Taylor, Principal Software Engineer, will offer a technical deep dive on:
- Understanding concurrency and its impact on throughput and latency
- Closed loop load testing, open loop load testing & the Universal Scaling Law
- The type of load testing you should be performing for capacity planning
- How to identify the region where your database can make the best use of concurrency
- Strategies for optimizing sound concurrency based on your data dependencies
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Cédrick Lunven
Are you new to Apache Cassandra® and wondering what all the excitement is about? Or a veteran Cassandra user interested in understanding what’s new in the project?
Attend our live webinar on October 18 to learn about the latest Cassandra release and why it represents a big step forward but also all the initiative and new projects rising in the ecosystem, DataStax Director of Developer Relations Cedrick Lunven will walk you through new features in version 4.1.
Get the inside scoop on how version 4.1 adds exciting new features for operators and improves the security posture, without compromising the stability achieved in Cassandra 4.0. Get some insights about projects actually in progress to make Cassandra more easy to use (Stargate) but also to deploy (K8ssandra).
You will learn:
System-wide Guardrails
Denylisting Partition Keys
Diagnostic events via CQL, not just JMX
CQLSH Auth support for LDAP, Kerberos and more
Lots of new, pluggable extension points
Also, celebrate our open source community with highlights from the 2022 Apache Cassandra World Party and a look ahead to Cassandra 5.0!
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
Ceph is a highly scalable open source distributed storage system that provides object, block, and file interfaces on a single platform. Although Ceph RBD block storage has dominated OpenStack deployments for several years, maturing object (S3, Swift, and librados) interfaces and stable CephFS (file) interfaces now make Ceph the only fully open source unified storage platform.
This talk will cover Ceph's architectural vision and project mission and how our approach differs from alternative approaches to storage in the OpenStack ecosystem. In particular, we will look at how our open development model dovetails well with OpenStack, how major contributors are advancing Ceph capabilities and performance at a rapid pace to adapt to new hardware types and deployment models, and what major features we are priotizing for the next few years to meet the needs of expanding cloud workloads.
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
Hailo is a taxi app that receives a hail every 4 seconds across 15 cities. It launched on AWS using MySQL but adopted Cassandra and Acunu for greater resilience during international expansion. Cassandra provided high availability and global replication. Acunu provided analytics capabilities on Cassandra data. Hailo uses Cassandra for entity storage and Acunu for analytics, seeing benefits like simplified data modeling, rich queries, and infrastructure monitoring. Choosing these platforms allowed for high availability, multi-data center operation, and scaling to support growth.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
More Related Content
Similar to Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam Overton
This document provides an overview of Cassandra, a decentralized, distributed database management system. It discusses why the author's company chose Cassandra over other options like HBase and MySQL for their real-time data needs. The document then covers Cassandra's data model, architecture, data partitioning, replication, and other key aspects like writes, reads, deletes, and compaction. It also notes some limitations of Cassandra and provides additional resource links.
C* Summit 2013: (Re)-Building the Social Grid for Global Telcos @ 1/10th the ...DataStax Academy
Darshan Rawal is leading the development of hybrid cloud based messaging products for global Tier 1 Telcos. Darshan has been working in Silicon valley since 2000, building nimble, cost effective products/services, handling millions of users and billions of transactions per day. Previous to Openwave Messaging, Darshan held engineering positions @ SS8 networks, Yahoo, DE Shaw, yp.com and has a M.S in Software Engineering from Carnegie Mellon University.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
This talk was given at Cassandra London meetup: https://www.meetup.com/Cassandra-London/events/267271963/ . The talk is about orchestration of Cassandra with our Kubernetes Operator and Yelp PaaSTA. We also outline some of the opportunities and challenges associated with this architecture.
Youtube link: https://www.youtube.com/watch?v=JqAILFkkibA
A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersAnant Corporation
In Cassandra Lunch #64: Cassandra for .NET Developers, Co-founder, Customer Experience Architect, and Sitecore MVP of Anant, Eric Ramseur will be presenting on Cassandra for .NET developers.
Accompanying Blog: Coming Soon!
Accompanying YouTube: https://youtu.be/9DwnDGak6Yo
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Drivers connect applications to Cassandra clusters and maintain connections to nodes. They probe clusters to discover nodes, token ranges, and latency. Drivers are data-aware and can route queries to appropriate replicas or fail over if needed. Cassandra clusters can span multiple data centers for redundancy, workload separation, and geographic distribution of data and queries. Configuration files like cassandra.yaml and cassandra-env.sh are used to configure memory, data storage, caching, and other settings. Cassandra clusters should be provisioned on commodity servers using tools like cassandra-stress to test workloads and estimate needed nodes.
Saratov open it teach talk.
Дамир Яраев:
Введение в Apache Cassandra (В ходе презентации Дамир расскажет, когда и почему стоит переходить с проверенных временем реляционных баз данных на ставшие модными в последнее время решения на базе NoSQL. В качестве примера рассмотрит колоночную NoSQL базу данных Apache Cassandra)
Cassandra concepts, patterns and anti-patternsDave Gardner
The document discusses Cassandra concepts, patterns, and anti-patterns. It begins with an agenda that covers choosing NoSQL, Cassandra concepts based on Dynamo and Bigtable, and patterns and anti-patterns of use. It then delves into Cassandra concepts such as consistent hashing, vector clocks, gossip protocol, hinted handoff, read repair, and consistency levels. It also discusses Bigtable concepts like sparse column-based data model, SSTables, commit log, and memtables. Finally, it outlines several patterns and anti-patterns of Cassandra use.
This talk is about orchestration of Cassandra on Kubernetes with Cassandra Operator and Yelp's Platform-as-a-Service: PaaSTA. The talk focusses specifically on the internals of cassandra operator and its core reconcile loop for reconciliation of cluster state and on-disk configuration.
Casandra is a open-source, distributed, highly scalable and fault-tolerant database. It is a best choice for managing structured, semi-structured or unstructured data at a large amount.
We provide an overview of the expressive object model, secondary indexes, high availability, write scalability, query language support, performance benchmarks - database model, performance benchmarks - load characteristics, performance benchmarks - consistency requirements, ease of use, and navigation aggregation.
This document discusses efficient data mining solutions using Hadoop, Cassandra, and Spark. It describes Cassandra as a fast, robust, and efficient key-value database but notes it has limitations for certain queries. Spark is presented as an alternative to Hadoop MapReduce that can be 100 times faster for interactive algorithms and data mining. The document demonstrates how Spark can integrate with Cassandra to allow distributed data processing over Cassandra data without needing to clone the data or use other databases. Future extensions are proposed to directly access Cassandra's SSTable files from Spark and extend CQL3 to leverage Spark.
1) Apache Cassandra in term of CAP Theorem
2) What makes Apache Cassandra "Available"?
3) How Apache Cassandra ensures data consistency?
4) Cassandra advantages and disadvantages
5) Frameworks/libraries to access Apache Cassandra + performance comparison
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. In this talk I will introduce you to a powerhouse combination of Cassandra and Spark, which provides a high-speed platform for both real-time and batch analysis.
This document provides an overview of Apache Cassandra and how to deploy it using Docker. It begins with an introduction to Docker fundamentals and the CAP theorem. It then covers Cassandra fundamentals, including its history, users, benefits like scalability and consistency options, and challenges. The document demonstrates writes and reads in Cassandra with different consistency levels. It also discusses data replication strategies and calculating consistency. Finally, it previews a demo of deploying Cassandra with Docker and lists additional resources.
How Optimizely (Safely) Maximizes Database Concurrency.pdfScyllaDB
Having a database that’s capable of high concurrency is one thing, but actually tapping all that potential concurrency is another. Fortunately, Optimizely Engineering has developed practical strategies that can help other teams.
Learn how Optimizely Engineering takes full advantage of the high concurrency that’s possible with their NoSQL database, ScyllaDB – while also guaranteeing correctness and protecting the quality of service. Brian Taylor, Principal Software Engineer, will offer a technical deep dive on:
- Understanding concurrency and its impact on throughput and latency
- Closed loop load testing, open loop load testing & the Universal Scaling Law
- The type of load testing you should be performing for capacity planning
- How to identify the region where your database can make the best use of concurrency
- Strategies for optimizing sound concurrency based on your data dependencies
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Cédrick Lunven
Are you new to Apache Cassandra® and wondering what all the excitement is about? Or a veteran Cassandra user interested in understanding what’s new in the project?
Attend our live webinar on October 18 to learn about the latest Cassandra release and why it represents a big step forward but also all the initiative and new projects rising in the ecosystem, DataStax Director of Developer Relations Cedrick Lunven will walk you through new features in version 4.1.
Get the inside scoop on how version 4.1 adds exciting new features for operators and improves the security posture, without compromising the stability achieved in Cassandra 4.0. Get some insights about projects actually in progress to make Cassandra more easy to use (Stargate) but also to deploy (K8ssandra).
You will learn:
System-wide Guardrails
Denylisting Partition Keys
Diagnostic events via CQL, not just JMX
CQLSH Auth support for LDAP, Kerberos and more
Lots of new, pluggable extension points
Also, celebrate our open source community with highlights from the 2022 Apache Cassandra World Party and a look ahead to Cassandra 5.0!
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
Ceph is a highly scalable open source distributed storage system that provides object, block, and file interfaces on a single platform. Although Ceph RBD block storage has dominated OpenStack deployments for several years, maturing object (S3, Swift, and librados) interfaces and stable CephFS (file) interfaces now make Ceph the only fully open source unified storage platform.
This talk will cover Ceph's architectural vision and project mission and how our approach differs from alternative approaches to storage in the OpenStack ecosystem. In particular, we will look at how our open development model dovetails well with OpenStack, how major contributors are advancing Ceph capabilities and performance at a rapid pace to adapt to new hardware types and deployment models, and what major features we are priotizing for the next few years to meet the needs of expanding cloud workloads.
Similar to Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam Overton (20)
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
Hailo is a taxi app that receives a hail every 4 seconds across 15 cities. It launched on AWS using MySQL but adopted Cassandra and Acunu for greater resilience during international expansion. Cassandra provided high availability and global replication. Acunu provided analytics capabilities on Cassandra data. Hailo uses Cassandra for entity storage and Acunu for analytics, seeing benefits like simplified data modeling, rich queries, and infrastructure monitoring. Choosing these platforms allowed for high availability, multi-data center operation, and scaling to support growth.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!
Understanding Cassandra internals to solve real-world problemsAcunu
The document summarizes Nicolas Favre-Felix's presentation on Cassandra internals at a Cassandra London meetup. It discusses four common problems encountered with Cassandra - high read latency, high CPU usage with little activity, long nodetool repair times, and optimizing write throughput. For each problem, it describes symptoms, analysis using tools like nodetool, and solutions like adjusting the data model, increasing thread pool sizes, and adding hardware resources. The key takeaways are that monitoring Cassandra is important, using the right data model impacts performance, and understanding how Cassandra stores and arranges data on disk is essential to optimization.
Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/
Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.
The document describes how Apache Cassandra can be used for real-time analytics on streaming data. It provides an example of counting Twitter mentions of a term per day in real-time by incrementing counters in Cassandra as tweets are processed. This allows queries to be answered by reading the counters. More complex queries can be supported by storing aggregated data in a denormalized format across rows and columns in Cassandra.
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
The document discusses implementing real-time analytics on Twitter data using Cassandra. It describes incrementing counters for each tweet to track token frequencies over time. This allows querying token mentions within a date range by reading the relevant counter columns. However, Cassandra's random partitioner prevents efficient range queries on rows. Instead, the solution denormalizes the data into wide rows with time buckets as columns to allow fast counting of token mentions within each time period through a single disk read. The document provides code examples and encourages experimenting with an open source implementation.
This document discusses real-time analytics with Cassandra. It includes sections on motivation/alternatives, what real-time analytics with Cassandra is, how it works, approximate analytics, and what problems it can help solve. The document contains log data as an example of the type of data that can be analyzed with this technique.
- The document discusses Acunu Analytics, a real-time big data analytics platform.
- It addresses the motivation for developing Acunu Analytics compared to alternatives. It also briefly describes what Acunu Analytics is, how it works, and what problems it can help solve.
- The main topics covered are the product itself, its capabilities for real-time analytics of big data, and potential use cases.
Realtime Analytics on the Twitter Firehose with CassandraAcunu
This document discusses using Cassandra for real-time analytics of Twitter data. It describes incrementing counters in Cassandra as tweets are processed to track metrics like mentions over time. This allows queries to retrieve trends by reading counters with a single I/O, rather than scanning large amounts of data. The document demonstrates preparing tweet data by tokenizing and incrementing counters in time buckets. It also covers implementing a range query to retrieve mentions between dates from a wide row with time buckets as columns.
This document discusses a distributed database called Acunu that is tunably consistent, highly available, and partition tolerant. It can scale out on commodity servers and provides high performance. The database uses a multi-master architecture without single points of failure and supports data replication across multiple data centers. It also provides a simple but powerful data model and is well-suited for applications involving high-velocity data.
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
The document discusses NoSQL, NewSQL, and other database technologies that are emerging to address limitations of relational databases in scaling to meet demands for performance, availability, and flexibility. It provides an overview of different categories of NoSQL databases and NewSQL solutions, and analyzes drivers like scalability, performance, relaxed consistency, agility, and complexity of data that are contributing to adoption of these new database approaches.
Acunu is developing an enterprise Cassandra appliance called Castle that aims to simplify Cassandra deployment and management. Castle includes a storage engine optimized for large disks and workloads, and allows for high density on commodity hardware. It also features fast disk rebuilds through its shared memory architecture. Acunu provides a web UI called the Control Center to configure, monitor, and troubleshoot Castle without deep Cassandra expertise. Acunu performs extensive automated testing of Castle to ensure reliability.
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
The document discusses the history and development of Cassandra Query Language (CQL), which provides an SQL-like interface for querying Apache Cassandra databases. It describes CQL evolving from versions 1.0 through 3.0 to become more standardized and user-friendly. Key points include CQL initially being introduced in Cassandra 0.8 to replace the low-level Thrift API, its goals of being simple, intuitive, and high performing, and ongoing work to improve its interface stability and driver support across languages.
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
The document discusses Cassandra's storage internals. It describes how Cassandra writes data to memtables and commit logs in memory before flushing to immutable SSTables on disk. It also explains how compaction merges SSTables to reclaim space and improve performance. For reads, Cassandra uses memtables, bloom filters on SSTables, key caches, and row caches to minimize disk I/O. Counters are implemented by coordinating writes across replicas.
Cassandra EU 2012 - Data modelling workshop by Richard LowAcunu
This document summarizes Richard Low's upcoming data modeling workshop. The workshop will cover what data modeling is, factors to consider when designing a data model like workload and queries, modeling options in Cassandra like rows and columns, and tools like counters and secondary indexes. It provides an example of modeling a scalable messaging application and compares it to a relational database model. The workshop aims to help attendees optimize their data for common queries and operations.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
2. Highly Available: The Cassandra Distribution Model
Cassandra is:
● built for scalability
● built to tolerate failure
In this talk:
● Cassandra distribution overview
● Partitioning and placement
● Replication
● Consistency
Cassandra Europe 2012
3. Highly Available: The Cassandra Distribution Model
Cassandra is:
● built for scalability
● built to tolerate failure
In this talk:
● Cassandra distribution overview
● Partitioning and placement
● Replication
● Consistency
Cassandra Europe 2012
4. Highly Available: The Cassandra Distribution Model
Overview
● High availability
● Partition tolerant
● Tunable consistency
● Scalable
● Replication
● No single point of failure
Cassandra Europe 2012
5. Highly Available: The Cassandra Distribution Model
Cassandra is:
● built for scalability
● built to tolerate failure
In this talk:
● Cassandra distribution overview
● Partitioning and placement
● Replication
● Consistency
Cassandra Europe 2012
6. Highly Available: The Cassandra Distribution Model
Partitioning and placement
Should...
● Assign data to hosts
● Have no S.P.O.F for routing clients to data
● Balance load
● Allow scaling without moving too much data
Cassandra Europe 2012
7. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Cassandra Europe 2012
8. Highly Available: The Cassandra Distribution Model
Consistent Hashing
(k2, v2)
(k1, v1)
(k3, v3)
Cassandra Europe 2012
9. Highly Available: The Cassandra Distribution Model
Consistent Hashing
● partitioner maps key to ring token
● hosts' tokens determine placement of keys
● and proportion of data assigned to each host
● each row is stored on one host
● wide rows can cause hot-spotting!
So how does it scale?
Cassandra Europe 2012
10. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Cassandra Europe 2012
11. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Bootstrapping a
new node
Cassandra Europe 2012
12. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Range is
transferred from old
host to new host
Cassandra Europe 2012
13. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Cassandra Europe 2012
14. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Cassandra Europe 2012
15. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Cassandra Europe 2012
16. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Decommission is
the reverse process
Cassandra Europe 2012
17. Highly Available: The Cassandra Distribution Model
Consistent Hashing
Cassandra Europe 2012
18. Highly Available: The Cassandra Distribution Model
Consistent Hashing
● Tokens can be assigned manually, automatically
or randomly
● Every node has full knowledge of placement
● Client connects to any node, max 1 hop to data
● Node status is gossiped
Cassandra Europe 2012
19. Highly Available: The Cassandra Distribution Model
Partitioners
● Converts a row key (from client data) into a
token on the ring
● RandomPartitioner
● Order Preserving Partitioner
Cassandra Europe 2012
20. Highly Available: The Cassandra Distribution Model
Partitioners
Random Partitioner
● token = hash(key)
● good load balancing
● no range queries across row keys
Cassandra Europe 2012
21. Highly Available: The Cassandra Distribution Model
Partitioners
Order Preserving Partitioner
● token = key
● requires manual load balancing
● careful selection of tokens around the ring
● allows range queries across row keys
Cassandra Europe 2012
22. Highly Available: The Cassandra Distribution Model
Partitioners
● Get it right first time!
● Design data model for RP
● Custom partitioners are possible if necessary
Cassandra Europe 2012
23. Highly Available: The Cassandra Distribution Model
Cassandra is:
● built for scalability
● built to tolerate failure
In this talk:
● Cassandra distribution overview
● Partitioning and placement
● Replication
● Consistency
Cassandra Europe 2012
24. Highly Available: The Cassandra Distribution Model
Replication
● For availability
● For redundancy
● Can increase read bandwidth
Cassandra Europe 2012
25. Highly Available: The Cassandra Distribution Model
Replication
● Replication Factor (RF) is number of copies of
data
● Defined per-keyspace
● Can be changed (eg. If data becomes more/less
valuable)
● Determines how many failures can be tolerated
Cassandra Europe 2012
26. Highly Available: The Cassandra Distribution Model
Replication Strategy
● Determines how replicas are assigned for each
host
● Defined per keyspace (like RF)
● SimpleStrategy
● NetworkTopologyStrategy
● Custom strategies can be written
Cassandra Europe 2012
27. Highly Available: The Cassandra Distribution Model
Replication Strategy : Simple Strategy
(k1, v1)
eg. RF=3
(k2, v2)
Cassandra Europe 2012
28. Highly Available: The Cassandra Distribution Model
Replication Strategy : Network Topology Strategy
Cassandra Europe 2012
29. Highly Available: The Cassandra Distribution Model
Replication Strategy : Network Topology Strategy
Multi-datacentre support
DC1 DC2
Cassandra Europe 2012
30. Highly Available: The Cassandra Distribution Model
Replication Strategy : Network Topology Strategy
Cassandra Europe 2012
31. Highly Available: The Cassandra Distribution Model
Snitches
● Enables routing of requests according to node
proximity
● Used by replication strategy to determine rack
and DC membership
● Custom snitches can be written
Cassandra Europe 2012
32. Highly Available: The Cassandra Distribution Model
Simple Snitch
●Every host is in the same rack & DC with equal
proximity
RackInferringSnitch
Infers the rack & DC from IP address of host
●
123.8.2.100
DC
rack host
Cassandra Europe 2012
33. Highly Available: The Cassandra Distribution Model
EC2Snitch
● DC = EC2 region
● Rack = EC2 availability zone
Property file snitch
●Rack and DC membership read from
configuration file
Cassandra Europe 2012
34. Highly Available: The Cassandra Distribution Model
DynamicSnitch
● Wraps each of the other snitches
● Records latency stats from read operations
● Avoids routing to slow hosts
● Configurable update intervals
Cassandra Europe 2012
35. Highly Available: The Cassandra Distribution Model
Cassandra is:
● built for scalability
● built to tolerate failure
In this talk:
● Cassandra distribution overview
● Partitioning and placement
● Replication
● Consistency
Cassandra Europe 2012
36. Highly Available: The Cassandra Distribution Model
Consistency
● Replication and failures/partitions cause
inconsistency
● Old versions of data can be returned
Timestamps:
● Chosen by the client
● Can be used to avoid read-modify-write
Cassandra Europe 2012
37. Highly Available: The Cassandra Distribution Model
Consistency
● Cassandra allows a trade-off between partition-
tolerance and consistency
For strong consistency:
●
R+W>N
1 1
●Eg. with 5 replicas
1 1 1
(RF = N = 5)
write to 3
read from 3 Cassandra Europe 2012
38. Highly Available: The Cassandra Distribution Model
Consistency
● Cassandra allows a trade-off between partition-
tolerance and consistency
For strong consistency:
●
write
R+W>N
2 1
●Eg. with 5 replicas
2 2 1
(RF = N = 5)
write to 3
read from 3 Cassandra Europe 2012
39. Highly Available: The Cassandra Distribution Model
Consistency
● Cassandra allows a trade-off between partition-
tolerance and consistency
For strong consistency:
●
read
R+W>N
2 1
●Eg. with 5 replicas
2 2 1
(RF = N = 5)
write to 3
read from 3 Cassandra Europe 2012
40. Highly Available: The Cassandra Distribution Model
Consistency Level
● ANY (only for writes)
● ONE, TWO, THREE
● QUORUM (N/2 + 1)
● LOCAL QUORUM
● ALL
● Relax strong consistency for partition tolerance
● To tolerate 1 node failure with strong consistency
use RF=3 with CL=QUORUM
Cassandra Europe 2012
41. Highly Available: The Cassandra Distribution Model
Increasing Consistency
● Read repair
● Hinted hand-off
● Anti-entropy repair
Cassandra Europe 2012
42. Highly Available: The Cassandra Distribution Model
Read Repair
Cassandra Europe 2012
43. Highly Available: The Cassandra Distribution Model
Read Repair
Cassandra Europe 2012
44. Highly Available: The Cassandra Distribution Model
Read Repair
Cassandra Europe 2012
45. Highly Available: The Cassandra Distribution Model
Read Repair
Cassandra Europe 2012
46. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v1)
eg. RF=2
(k1, v1)
Cassandra Europe 2012
47. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v1)
eg. RF=2
(k1, v1)
Write (k1, v2)
Cassandra Europe 2012
48. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v1)
eg. RF=2
(k1, v1)
Write (k1, v2)
Cassandra Europe 2012
49. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v1)
eg. RF=2
(k1, v1)
Write (k1, v2)
Cassandra Europe 2012
50. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v1)
eg. RF=2
(k1, v1)
Write (k1, v2)
(k1,
Cassandra Europe 2012 v2)
51. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v2)
eg. RF=2
(k1, v1)
(k1,
Cassandra Europe 2012 v2)
52. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v2)
eg. RF=2
(k1, v2)
(k1,
Cassandra Europe 2012 v2)
53. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
(k1, v2)
eg. RF=2
(k1, v2)
(k1,
Cassandra Europe 2012 v2)
54. Highly Available: The Cassandra Distribution Model
Hinted Hand-off
● Hinted writes do not count towards the chosen
consistency level
● … except with CL=ANY which succeeds even if
all replicas are down
● Don't rely on hints: hints cannot be read!
Cassandra Europe 2012
55. Highly Available: The Cassandra Distribution Model
Anti-entropy repair
● Manual maintenance process
● Compares all data stored on a host with the
replicas
● Differences are streamed to restore consistency
● Must be run every 10 days to ensure
tombstones are replicated
Cassandra Europe 2012
56. Highly Available: The Cassandra Distribution Model
Cassandra is:
● built for scalability
● built to tolerate failure
In this talk:
● Cassandra distribution overview
● Partitioning and placement
● Replication
● Consistency
fin. Cassandra Europe 2012