The document discusses Cassandra Storage Engine (Cassandra SE) in MariaDB, which allows MariaDB to access Cassandra data. Cassandra SE provides a SQL view of Cassandra data by mapping Cassandra concepts like column families to MariaDB tables. It supports basic operations like SELECT, INSERT, and UPDATE between the two systems. The document outlines use cases, benchmarks Cassandra SE performance, and discusses future directions like supporting additional Cassandra features.
Introduction into MySQL Query Tuning for Dev[Op]sSveta Smirnova
Percona Live Online 2021 talk: https://www.percona.com/resources/videos/introduction-mysql-query-tuning-for-devops
In this talk I will show how to get started with MySQL Query Tuning. I will make a short introduction into physical table structure and demonstrate how it may influence query execution time.
Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite queries or change table structure to achieve better performance.
Optimizer Trace Walkthrough talk from MariaDB Fest 2020.
Covers trace contents, reasons why this feature was implemented this way, comparison with Optimizer Trace in MySQL 8
Introduction into MySQL Query Tuning for Dev[Op]sSveta Smirnova
Percona Live Online 2021 talk: https://www.percona.com/resources/videos/introduction-mysql-query-tuning-for-devops
In this talk I will show how to get started with MySQL Query Tuning. I will make a short introduction into physical table structure and demonstrate how it may influence query execution time.
Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite queries or change table structure to achieve better performance.
Optimizer Trace Walkthrough talk from MariaDB Fest 2020.
Covers trace contents, reasons why this feature was implemented this way, comparison with Optimizer Trace in MySQL 8
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
MySQL 8.0 has come up with an exciting addition of window function which can make the developer life more ease with complex SQL's. This presentation will be a walk through over window functions in MySQL 8.0.
http://www.mydbops.com
Modern query optimisation features in MySQL 8.Mydbops
MySQL 8 (a huge leap forward), indexing capabilities, execution plan enhancements, optimizer improvements, and many other current query tweak features are covered in the slides.
Character Encoding - MySQL DevRoom - FOSDEM 2015mushupl
Character encoding configuration in MySQL has always been a bit confusing. With too many options to set, unclear relationships between them, and the default settings that make MySQL incompatible with most languages, it is a headache to many users, many of whom end up with broken data. This lecture will provide an overview of the character set support in MySQL, guidelines on how to use it correctly, and will demonstrate several methods of detecting and repairing mangled data.
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility of three different roles: Development, DBA and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge, gained by MySQL DBAs after years or focus on the single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show minimal, but the most effective, set of options which will improve MySQL performance. For illustrations, I will use real user stories, gained by my Support experience, and Kubernetes operators, now available from all main MySQL eco-system vendors: Oracle, MariaDB, and Percona.
Presented at Open Source Summit Europe 2020: https://sched.co/eCGf
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
MySQL 8.0 has come up with an exciting addition of window function which can make the developer life more ease with complex SQL's. This presentation will be a walk through over window functions in MySQL 8.0.
http://www.mydbops.com
Modern query optimisation features in MySQL 8.Mydbops
MySQL 8 (a huge leap forward), indexing capabilities, execution plan enhancements, optimizer improvements, and many other current query tweak features are covered in the slides.
Character Encoding - MySQL DevRoom - FOSDEM 2015mushupl
Character encoding configuration in MySQL has always been a bit confusing. With too many options to set, unclear relationships between them, and the default settings that make MySQL incompatible with most languages, it is a headache to many users, many of whom end up with broken data. This lecture will provide an overview of the character set support in MySQL, guidelines on how to use it correctly, and will demonstrate several methods of detecting and repairing mangled data.
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility of three different roles: Development, DBA and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge, gained by MySQL DBAs after years or focus on the single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show minimal, but the most effective, set of options which will improve MySQL performance. For illustrations, I will use real user stories, gained by my Support experience, and Kubernetes operators, now available from all main MySQL eco-system vendors: Oracle, MariaDB, and Percona.
Presented at Open Source Summit Europe 2020: https://sched.co/eCGf
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
Learn about the MariaDB 10 features that exist for developers: microseconds, virtual columns, PCRE regular expressions, DELETE ... RETURNING, geospatial extensions (GIS), dynamic columns. Use cases, and a hint of storage engines
This presentation introduces people to Cassandra and Column Family Datastores in general. I will discuss what Cassandra is, how and when it is useful, and how it integrates with Rails. I will also go in to lessons learned during our 3-month project, and the useful patterns that emerged. The discussion will be very technical, but targeted at developers who are not familiar with, or have not done a project with Cassandra.
MariaDB for Developers and Operators (DevOps)Colin Charles
MariaDB features that both developers and operators will find benefit from, especially when we focus on the 10.0 release. This is specific to a Red Hat Summit/DevNation crowd.
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr
See how our engineering team have implemented an open source Apache Spark on Apache Cassandra solution to capture metrics of the various nodes that we monitor for our customers.
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
First Steps of an Oracle-expert in the Big Data World. Everyone speaks about Big Data. But what does it mean? This speech focuses on one animal of the Big Data Zoo - Cassandra and answers the following questions:
- Why another database?
- There is Impala and Spark. Why would I need Cassandra?
- New database - do I need to learn a new language?
- How do I get the data in?
- Can I use SQL?
- Is it part of a distribution, for example Cloudera?
Demos will explain the theory.
DataSource V2 and Cassandra – A Whole New WorldDatabricks
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we’ll walk you through some of the biggest highlights and how you can take advantage of them today.
MyRocks: табличный движок для MySQL на основе RocksDBSergey Petrunya
MyRocks: табличный движок для MySQL на основе RocksDB.
Презентация с HighLoad++ 2015.
Рассказывается о принципах работы LSM-Trees, их реализации в RocksDB, зачем и как был сделан MyRocks, с какими проблемами столкнулись и как их решили.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
1. Cassandra Storage Engine in MariaDB
MariaDB Cassandra interoperability
Sergei Petrunia
Colin Charles
2. Who are we
● Sergei Petrunia
– Principal developer of CassandraSE, optimizer
developer, formerly from MySQL
– psergey@mariadb.org
● Colin Charles
– Chief Evangelist, MariaDB, formerly from MySQL
– colin@mariadb.org
3. Agenda
● An introduction to Cassandra
● The Cassandra Storage Engine
(Cassandra SE)
● Data mapping
● Use cases
● Benchmarks
● Conclusion
4. Background: what is Cassandra
• A distributed NoSQL database
– Key-Value store
● Limited range scan suppor
– Optionally flexible schema
● Pre-defined “static” columns
● Ad-hoc “dynamic” columns
– Automatic sharding / replication
– Eventual consistency
4
5. Background: Cassandra's data model
• “Column families” like tables
• Row key → columns
• Somewhat similar to SQL but
some important differences.
• Supercolumns are not
supported
5
6. CQL – Cassandra Query Language
Looks like SQL at first glance
6
bash$ cqlsh -3
cqlsh> CREATE KEYSPACE mariadbtest
... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1};
cqlsh> use mariadbtest;
cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key,
... data1 varchar, data2 bigint
... ) with compact storage;
cqlsh:mariadbtest> insert into cf1 (pk, data1,data2)
... values ('row1', 'data-in-cassandra', 1234);
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
7. CQL is not SQL
Similarity with SQL is superficial
7
cqlsh:mariadbtest> select * from cf1 where pk='row1';
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where data2=1234;
Bad Request: No indexed columns present in by-columns clause with Equal
operator
cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2';
Bad Request: line 1:34 missing EOF at 'or'
• No joins or subqueries
• No GROUP BY, ORDER BY must be able to use available
indexes
• WHERE clause must represent an index lookup.
9. 1. Load the Cassandra SE plugin
• Get MariaDB 10.0.1+
• Load the Cassandra plugin
– From SQL:
9
MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so';
[mysqld]
...
plugin-load=ha_cassandra.so
– Or, add a line to my.cnf:
MariaDB [(none)]> show plugins;
+--------------------+--------+-----------------+-----------------+---------+
| Name | Status | Type | Library | License |
+--------------------+--------+-----------------+-----------------+---------+
...
| CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL |
+--------------------+--------+-----------------+-----------------+---------+
• Check it is loaded
10. 2. Connect to Cassandra
• Create an SQL table which is a view of a column family
10
MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> create table t2 (pk varchar(36) primary key,
-> data1 varchar(60),
-> data2 bigint
-> ) engine=cassandra
-> keyspace='mariadbtest'
-> thrift_host='10.196.2.113'
-> column_family='cf1';
Query OK, 0 rows affected (0.01 sec)
– thrift_host can be set per-table
– @@cassandra_default_thrift_host allows to
● Re-point the table to different node dynamically
● Not change table DDL when Cassandra IP changes.
11. Possible gotchas
11
• SELinux blocks the connection
MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ;
ERROR 1429 (HY000): Unable to connect to foreign data source: connect()
failed: Permission denied [1]
MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ;
ERROR 1429 (HY000): Unable to connect to foreign data source: Column family
cf1 not found in keyspace mariadbtest
• Cassandra 1.2 and CFs without “COMPACT STORAGE”
– Packaging bug
– To get running quickly: echo 0 >/selinux/enforce
– Caused by a change in Cassandra 1.2
– They broke Pig also
– We intend to update CassandraSE for 1.2
12. Accessing Cassandra data
●
Can insert data
12
MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123);
Query OK, 1 row affected (0.00 sec)
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
row2 | data-from-mariadb | 123
• Cassandra sees inserted data
MariaDB [test]> select * from t2;
+------+-------------------+-------+
| pk | data1 | data2 |
+------+-------------------+-------+
| row1 | data-in-cassandra | 1234 |
+------+-------------------+-------+
• Can get Cassandra's data
14. Data mapping between Cassandra and SQL
14
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
15. Data mapping between Cassandra and SQL
15
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
• Table must have a primary key
– Name/type must match Cassandra's rowkey
16. Data mapping between Cassandra and SQL
16
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
• Table must have a primary key
– Name/type must match Cassandra's rowkey
• Columns map to Cassandra's static columns
– Name must be the same as in Cassandra
– Datatypes must match
– Can any subset of CF's columns
17. Datatype mapping
Cassandra MariaDB
blob BLOB, VARBINARY(n)
ascii BLOB, VARCHAR(n), use charset=latin1
text BLOB, VARCHAR(n), use charset=utf8
varint VARBINARY(n)
int INT
bigint BIGINT, TINY, SHORT
uuid CHAR(36) (text in MariaDB)
timestamp TIMESTAMP (second precision), TIMESTAMP(6) (microsecond precision),
BIGINT
boolean BOOL
float FLOAT
double DOUBLE
decimal VARBINARY(n)
counter BIGINT
• CF column datatype determines MariaDB datatype
18. Dynamic columns
• Cassandra supports “dynamic column families”
• Can access ad-hoc columns with MariaDB's
dynamic columns feature
18
create table tbl
(
rowkey type PRIMARY KEY
column1 type,
...
dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes
) engine=cassandra keyspace=... column_family=...;
insert into tbl values
(1, column_create('col1', 1, 'col2', 'value-2'));
select rowkey,
column_get(dynamic_cols, 'uuidcol' as char)
from tbl;
19. Data mapping is safe
create table t3 (pk varchar(60) primary key, no_such_field int)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not
be mapped to any field in Cassandra'
create table t3 (pk varchar(60) primary key, data1 double)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Failed to map column data1
to datatype org.apache.cassandra.db.marshal.UTF8Type'
• Cassandra SE will refuse incorrect mappings
22. SELECT command mapping
● MariaDB has an SQL interpreter
● Cassandra SE supports lookups and scans
● Can now do
– Arbitrary WHERE clauses
– JOINs between Cassandra tables and
MariaDB tables
● Batched Key Access is supported
23. DML command mapping
● No SQL semantics
– INSERT overwrites rows
– UPDATE reads, then writes
● Have you updated what you read
– DELETE reads, then deletes
● Can't be sure if/what you have deleted
● Not as bad as it sounds, it's Cassandra
– Cassandra SE doesn't make it SQL.
25. Cassandra use cases
● Collect massive amounts
of data
– Web page hits
– Sensor updates
● Updates are naturally non-conflicting
– Keyed by UUIDs, timestamps
● Reads are served with one lookup
● Good for certain kinds of data
– Moving from SQL entirely may be difficult
26. Cassandra SE use cases (1)
● Send an update to Cassandra
– Be a sensor
● Grab a piece of data from Cassandra
– “This web page was last viewed by …”
– “Last known position of this user was ...”.
Access Cassandra
data from SQL
27. Cassandra SE use cases (2)
● Want a special table that is
– auto-replicated
– fault-tolerant
– Very fast?
● Get Cassandra, and create a
Cassandra SE table.
Coming from MySQL/MariaDB side:
28. Cassandra Storage Engine non-use cases
• Huge, sift-through-all-data joins
– Use Pig
• Bulk data transfer to/from Cassandra
cluster
– Use Sqoop
• A replacement for InnoDB
– No full SQL semantics
28
29. A “benchmark”
• One table
• EC2 environment
– m1.large nodes
– Ephemeral disks
• Stream of single-line
INSERTs
• Tried Innodb and
Cassandra
• Hardly any tuning
30. Conclusions
• Cassandra SE can be used to peek at
data in Cassandra from MariaDB.
• It is not a replacement for Pig/Hive
• It is really easy to setup and use
30
31. Going Forward
• Looking for input
• Do you want support for
– Fast counter columns updates?
– Awareness of Cassandra cluster
topology?
– Secondary indexes?
– …?
31
34. Extra: Cassandra SE internals
• Developed against Cassandra 1.1
• Uses Thrift API
– cannot stream CQL resultset in 1.1
– Cant use secondary indexes
• Only supports AllowAllAuthenticator
• In Cassandra 1.2
– “CQL Binary Protocol” with streaming
– CASSANDRA-5234: Thrift can only read CFs
“WITH COMPACT STORAGE”
34