This presentation explains how to get started with Apache Cassandra to provide a scale out, fault tolerant backend for inventory storage on OpenSimulator.
This presentation explains how to get started with Apache Cassandra to provide a scale out, fault tolerant backend for inventory storage on OpenSimulator.
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
Major relational database platforms available at the moment microsoftMy-Writing-Expert.org
http://my-writing-expert.com/ .That's a sample paper - essay / paper on the topic "Major relational database platforms available at the moment microsoft" created by our writers!
Disclaimer: The paper above have been completed for actual clients. We have acclaimed personal permission from the customers to post it.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data. Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large components fail continuously. Cassandra manages the persistent state in the face of the failures which drives the reliability and scalability of the software systems. Cassandra does not support a full relational data model because it resembles a database and shares many design and implementation strategies. In this paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and read efficiency.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
Major relational database platforms available at the moment microsoftMy-Writing-Expert.org
http://my-writing-expert.com/ .That's a sample paper - essay / paper on the topic "Major relational database platforms available at the moment microsoft" created by our writers!
Disclaimer: The paper above have been completed for actual clients. We have acclaimed personal permission from the customers to post it.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data. Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large components fail continuously. Cassandra manages the persistent state in the face of the failures which drives the reliability and scalability of the software systems. Cassandra does not support a full relational data model because it resembles a database and shares many design and implementation strategies. In this paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and read efficiency.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
Confoo.ca conference talk February 24th 2021 on MySQL new features found in version 8.0 including server and supporting utility updates for those who may have missed some really neat new features
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.
To view the full-length video and tutorial, visit: https://academy.datastax.com/demos/getting-started-graph-databases
Getting Started with Graph Databases contains a brief overview of RDBMS architecture in comparison to graph, basic graph terminology, a real-world use case for graph, and an overview of Gremlin, the standard graph query language found in TinkerPop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Five years of Cassandra
0.1
Jul-08
...
0.3
Jul-09
0.6
Jun-10
0.7
1.0
May-11
1.2
Apr-12
Mar-13
2.0
Mar-14
DSE
I’ve been working on Cassandra for five years now. Facebook open sourced it in July of 2008, and I started working on it at Rackspace in December. A year and a half later, I
started DataStax to commercialize it.
3. Core values
•Massive scalability
•High performance
•Reliability/Availabilty
For the first four years we focused on these three core values.
Cassandra
MySQL
HBase
Redis
4. New core value
•Massive scalability
•High performance
•Reliability/Availabilty
•Ease of use
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE INDEX ON
users(state);
SELECT * FROM users
WHERE state=‘Texas’
AND birth_date > 1950;
2013 saw us focus on a fourth value, ease of use, starting with the introduction of CQL3 in January with Cassandra 1.2.
CQL (Cassandra Query Language) is a dialect of SQL optimized for Cassandra. All the statements on the right of this slide are valid in both CQL and SQL.
5. Native Drivers
•CQL native protocol: efficient, lightweight, asynchronous
•Java (GA): https://github.com/datastax/java-driver
•.NET (GA): https://github.com/datastax/csharp-driver
•Python (Beta): https://github.com/datastax/pythondriver
•C++ (Beta): https://github.com/datastax/cpp-driver
•Coming soon: PHP, Ruby
We also introduced a native CQL protocol, cutting out the overhead and complexity of Thrift. DataStax has open sourced half a dozen native CQL drivers and is working on
more.
6. DataStax DevCenter
We’ve also released DevCenter, an interactive tool for exploring and querying your Cassandra
databases. DevCenter is the first tool of its kind for a NoSQL database.
7. Tracing
cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2);
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity
| timestamp
| source
| source_elapsed
-------------------------------------+--------------+-----------+---------------Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 |
540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 |
779
Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 |
63
Applying mutation | 00:02:37,016 | 127.0.0.2 |
220
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 |
250
Appending to commitlog | 00:02:37,016 | 127.0.0.2 |
277
Adding to memtable | 00:02:37,016 | 127.0.0.2 |
378
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 |
710
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 |
888
Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 |
2334
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 |
2550
Perhaps the biggest problem people have after deploying Cassandra is understanding what goes on under the hood. We introduced query tracing to shed some light on this.
One of the challenges is gathering information from all the nodes that participate in processing a query; here, the coordinator (in blue) receives the query from the client and
forwards it to a replica (in green) which then responds back to the coordinator.
8. Authentication
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator
We added authentication and authorization, following familiar patterns. Note that the default user and password is cassandra/cassandra, so good practice is to create a new
superuser and drop or change the password on the old one.
Apache Cassandra ships with password authentication built in; DSE (DataStax Enterprise) adds Kerberos single-sign-on integration.
9. Authentication
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator
CREATE USER robin
WITH PASSWORD 'manager' SUPERUSER;
ALTER USER cassandra
WITH PASSWORD 'newpassword';
LIST USERS;
DROP USER cassandra;
We added authentication and authorization, following familiar patterns. Note that the default user and password is cassandra/cassandra, so good practice is to create a new
superuser and drop or change the password on the old one.
Apache Cassandra ships with password authentication built in; DSE (DataStax Enterprise) adds Kerberos single-sign-on integration.
11. Cassandra 2.0
Everything I’ve talked about so far is “ancient history” from Cassandra 1.2, but I wanted to cover it again as a refresher. Now let’s talk about what we added for Cassandra 2.0,
released in September.
12. Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
13. Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)
SELECT name
FROM users
WHERE username = 'pmcfadin';
The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
14. Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)
SELECT name
FROM users
WHERE username = 'pmcfadin';
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00');
(0 rows)
The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
15. Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)
SELECT name
FROM users
WHERE username = 'pmcfadin';
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00');
(0 rows)
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01');
The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
16. Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)
SELECT name
FROM users
WHERE username = 'pmcfadin';
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00');
(0 rows)
This one wins
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01');
The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
17. Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will
get back the row that was created concurrently as an explanation.
UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
18. Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
----------True
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;
Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will
get back the row that was created concurrently as an explanation.
UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
19. Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
----------True
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;
[applied] | username | created_date
| name
-----------+----------+----------------+---------------False | pmcfadin | 2011-06-20 ... | Patrick McFadin
Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will
get back the row that was created concurrently as an explanation.
UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
20. Paxos
•All operations are quorum-based
•Each replica sends information about unfinished
operations to the leader during prepare
•Paxos made Simple
Under the hood, lightweight transactions are implemented with the Paxos consensus protocol.
21. Details
•Paxos state is durable
•Immediate consistency with no leader election or failover
•ConsistencyLevel.SERIAL
•http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0
•4 round trips vs 1 for normal updates
Paxos has these implications for our implementation.
22. Use with caution
•Great for 1% of your application
•Eventual consistency is your friend
• http://www.slideshare.net/planetcassandra/c-summit-2013-
eventual-consistency-hopeful-consistency-by-christos-kalantzis
“4 round trips” is the big downside for Paxos. This makes lightweight transactions a big performance hit in single-datacenter deployments and wildly impractical for multidatacenter clusters. They should only be used for targeted pieces of an application when the alternative is corruption, like our account creation example.
23. Cursors (before)
CREATE TABLE timeline (
user_id uuid,
tweet_id timeuuid,
tweet_author uuid,
tweet_body text,
PRIMARY KEY (user_id, tweet_id)
);
SELECT *
FROM timeline
WHERE (user_id = :last_key
AND tweet_id > :last_tweet)
OR token(user_id) > token(:last_key)
LIMIT 100
Cassandra 2.0 introduced cursors to the native protocol. This makes paging through large resultsets much simpler. Note how we need one clause per component of the
primary key to fetch the next 100 rows here.
24. Cursors (after)
SELECT *
FROM timeline
Now Cassandra handles the details of getting extra results as you iterate through a resultset. In fact, our cursors are a little bit smarter than in your favorite RDBMS (relational
database management system) since they are failover-aware: if the coordinator in use fails, the cursor will pick up where it left off against a different node in the cluster.
27. Other CQL improvements
•SELECT DISTINCT pk
•CREATE TABLE IF NOT EXISTS table
We made some other miscellaneous improvements in CQL for 2.0 as well.
28. Other CQL improvements
•SELECT DISTINCT pk
•CREATE TABLE IF NOT EXISTS table
•SELECT ... AS
• SELECT
event_id, dateOf(created_at) AS creation_date
We made some other miscellaneous improvements in CQL for 2.0 as well.
29. Other CQL improvements
•SELECT DISTINCT pk
•CREATE TABLE IF NOT EXISTS table
•SELECT ... AS
• SELECT
event_id, dateOf(created_at) AS creation_date
•ALTER TABLE DROP column
•
We made some other miscellaneous improvements in CQL for 2.0 as well.
30. On-Heap/Off-Heap
On-Heap
Managed by GC
Off-Heap
Not managed by GC
Java Process
We’ve put a lot of effort into improveming how Cassandra manages its memory. You’re looking at a limit of about 8GB for a JVM heap, even though modern servers have much
more RAM available. So we’re optimizing heap use, pushing internal structures into off-heap memory where possible.
31. Read path (per sstable)
Bloom
filter
Memory
Disk
To understand what we’ve done, I need to explain how a read works in Cassandra.
32. Read path (per sstable)
Bloom
filter
Memory
Disk
To understand what we’ve done, I need to explain how a read works in Cassandra.
Partition
key cache
33. Read path (per sstable)
Bloom
filter
Partition
summary
Memory
Disk
To understand what we’ve done, I need to explain how a read works in Cassandra.
0X...
0X...
0X...
Partition
key cache
34. Read path (per sstable)
Bloom
filter
Partition
summary
0X...
0X...
0X...
Memory
Disk
0X...
0X...
0X...
0X...
Partition
index
To understand what we’ve done, I need to explain how a read works in Cassandra.
Partition
key cache
35. Read path (per sstable)
Bloom
filter
Compression
offsets
Partition
summary
0X...
0X...
0X...
Memory
Disk
0X...
0X...
0X...
0X...
Partition
index
To understand what we’ve done, I need to explain how a read works in Cassandra.
Partition
key cache
36. Read path (per sstable)
Bloom
filter
Compression
offsets
Partition
summary
0X...
0X...
0X...
Memory
Disk
0X...
0X...
0X...
0X...
Data
Partition
index
To understand what we’ve done, I need to explain how a read works in Cassandra.
Partition
key cache
37. Off heap in 2.0
Partition key bloom filter
1-2GB per billion partitions
Bloom
filter
Compression
offsets
Partition
summary
0X...
0X...
0X...
Memory
Disk
Partition
key cache
0X...
0X...
0X...
0X...
Data
Partition
index
These are the components that are allocated off-heap now. We use reference counting to deallocate them when the sstable (data file) they are associated with is obsoleted by
compaction.
38. Off heap in 2.0
Compression metadata
~1-3GB per TB compressed
Bloom
filter
Compression
offsets
Partition
summary
0X...
0X...
0X...
Memory
Disk
0X...
0X...
0X...
0X...
Data
Partition
index
Partition
key cache
39. Off heap in 2.0
Partition index summary
(depends on rows per partition)
Bloom
filter
Compression
offsets
Partition
summary
0X...
0X...
0X...
Memory
Disk
0X...
0X...
0X...
0X...
Data
Partition
index
Partition
key cache
41. Healthy leveled compaction
L0
L1
L2
L3
L4
L5
The goal of leveled compaction is to provide a read performance guarantee. We divide the sstables up into levels, where each level has 10x as much data as the previous (so
the diagram here is not to scale!), and guarantee that any given row is only present in at most one sstable per level.
Newly flushed sstables start in level zero, which is not yet processed into the tiered levels, and the one-per-sstable rule does not apply there. So we need to check potentially
each sstable in L0.
42. Sad leveled compaction
L0
L1
L2
L3
L4
L5
The problem is that we can fairly easily flush new sstables to L0 faster than compaction can level them. That results in poor read performance since we need to check so many
sstables for each row. This in turn results in even less i/o available for compaction and L0 will fall even further behind.
43. STCS in L0
L0
L1
L2
L3
L4
L5
So what we do in 2.0 is perform size-tiered compaction when L0 falls behind. This doesn’t magically make LCS faster, since we still need to process these sstables into the
levels, but it does mean that we prevent read performance from going through the floor in the meantime.
44. A closer look at reads
90%
busy
Client
Coordinator
30%
busy
40%
busy
Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
45. A closer look at reads
90%
busy
Client
Coordinator
30%
busy
40%
busy
Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
46. A closer look at reads
90%
busy
Client
Coordinator
30%
busy
40%
busy
Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
47. A closer look at reads
90%
busy
Client
Coordinator
30%
busy
40%
busy
Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
48. A closer look at reads
90%
busy
Client
Coordinator
30%
busy
40%
busy
Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
61. Rapid Read Protection
NONE
Here we have a graph of read performance over time in a small four-node cluster. One of the nodes is killed halfway through. You can see how the rapid read protection results
in a much lower impact on throughput. (There is still some drop since we need to repeat 25% of the queries against other relicas all at once.)
62. Latency (mid-compaction)
Rapid Read Protection can also reduce latency variance. Look at the 99.9th percentile numbers here. With no rapid read protection, the slowest 0.1% of reads took almost
50ms. Retrying the slowest 10% of queries brings that down to 14.5ms. If we only retry the slowest 1%, that’s 19.6ms. But note that issuing extra reads for all requests
actually results in a higher 99th percentile! Looking at the throughput number shows us why -- we’re running out of capacity in our cluster to absorb the extra requests.
64. User defined types
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<text, address>
)
SELECT id, name, addresses.city, addresses.phones FROM users;
id |
name | addresses.city |
addresses.phones
--------------------+----------------+-------------------------63bf691f | jbellis |
Austin | {'512-4567', '512-9999'}
We introduced collections in Cassandra 1.2, but they had a number of limitations. One is that collections could not contain other collections. User defined types in 2.1 allow
that. Here we have an address type, that holds a set of phone numbers. We can then use that address type in a map in the users table.
65. Collection indexing
CREATE TABLE songs (
id uuid PRIMARY KEY,
artist text,
album text,
title text,
data blob,
tags set<text>
);
CREATE INDEX song_tags_idx ON songs(tags);
SELECT * FROM songs WHERE tags CONTAINS 'blues';
id
| album
| artist
| tags
| title
----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
2.1 also brings index support to
collections.
74. More-efficient repair
We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we
only have to send actual rows across the network where the tree indicates an inconsistency.
75. More-efficient repair
We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we
only have to send actual rows across the network where the tree indicates an inconsistency.
76. More-efficient repair
We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we
only have to send actual rows across the network where the tree indicates an inconsistency.
77. More-efficient repair
The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair
ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
78. More-efficient repair
The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair
ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
79. More-efficient repair
The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair
ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
80. More-efficient repair
So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as
long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
81. More-efficient repair
So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as
long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
82. More-efficient repair
So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as
long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
83. Performance
•Memtable memory use cut by 85%
• larger sstables, less compaction
• ~50% better write performance
•Full results after beta1