SlideShare a Scribd company logo
1 of 34
Download to read offline
Cassandra Storage Engine in MariaDB
MariaDB Cassandra interoperability
Sergei Petrunia
Colin Charles
Who are we
● Sergei Petrunia
– Principal developer of CassandraSE, optimizer
developer, formerly from MySQL
– psergey@mariadb.org
● Colin Charles
– Chief Evangelist, MariaDB, formerly from MySQL
– colin@mariadb.org
Agenda
● An introduction to Cassandra
● The Cassandra Storage Engine
(Cassandra SE)
● Data mapping
● Use cases
● Benchmarks
● Conclusion
Background: what is Cassandra
• A distributed NoSQL database
– Key-Value store
● Limited range scan suppor
– Optionally flexible schema
● Pre-defined “static” columns
● Ad-hoc “dynamic” columns
– Automatic sharding / replication
– Eventual consistency
4
Background: Cassandra's data model
• “Column families” like tables
• Row key → columns
• Somewhat similar to SQL but
some important differences.
• Supercolumns are not
supported
5
CQL – Cassandra Query Language
Looks like SQL at first glance
6
bash$ cqlsh -3
cqlsh> CREATE KEYSPACE mariadbtest
... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1};
cqlsh> use mariadbtest;
cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key,
... data1 varchar, data2 bigint
... ) with compact storage;
cqlsh:mariadbtest> insert into cf1 (pk, data1,data2)
... values ('row1', 'data-in-cassandra', 1234);
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
CQL is not SQL
Similarity with SQL is superficial
7
cqlsh:mariadbtest> select * from cf1 where pk='row1';
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where data2=1234;
Bad Request: No indexed columns present in by-columns clause with Equal
operator
cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2';
Bad Request: line 1:34 missing EOF at 'or'
• No joins or subqueries
• No GROUP BY, ORDER BY must be able to use available
indexes
• WHERE clause must represent an index lookup.
Cassandra Storage Engine
8
Provides a “view” of Cassandra's data
from MariaDB.
Starts a NoCQL movement
1. Load the Cassandra SE plugin
• Get MariaDB 10.0.1+
• Load the Cassandra plugin
– From SQL:
9
MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so';
[mysqld]
...
plugin-load=ha_cassandra.so
– Or, add a line to my.cnf:
MariaDB [(none)]> show plugins;
+--------------------+--------+-----------------+-----------------+---------+
| Name | Status | Type | Library | License |
+--------------------+--------+-----------------+-----------------+---------+
...
| CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL |
+--------------------+--------+-----------------+-----------------+---------+
• Check it is loaded
2. Connect to Cassandra
• Create an SQL table which is a view of a column family
10
MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> create table t2 (pk varchar(36) primary key,
-> data1 varchar(60),
-> data2 bigint
-> ) engine=cassandra
-> keyspace='mariadbtest'
-> thrift_host='10.196.2.113'
-> column_family='cf1';
Query OK, 0 rows affected (0.01 sec)
– thrift_host can be set per-table
– @@cassandra_default_thrift_host allows to
● Re-point the table to different node dynamically
● Not change table DDL when Cassandra IP changes.
Possible gotchas
11
• SELinux blocks the connection
MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ;
ERROR 1429 (HY000): Unable to connect to foreign data source: connect()
failed: Permission denied [1]
MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ;
ERROR 1429 (HY000): Unable to connect to foreign data source: Column family
cf1 not found in keyspace mariadbtest
• Cassandra 1.2 and CFs without “COMPACT STORAGE”
– Packaging bug
– To get running quickly: echo 0 >/selinux/enforce
– Caused by a change in Cassandra 1.2
– They broke Pig also
– We intend to update CassandraSE for 1.2
Accessing Cassandra data
●
Can insert data
12
MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123);
Query OK, 1 row affected (0.00 sec)
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
row2 | data-from-mariadb | 123
• Cassandra sees inserted data
MariaDB [test]> select * from t2;
+------+-------------------+-------+
| pk | data1 | data2 |
+------+-------------------+-------+
| row1 | data-in-cassandra | 1234 |
+------+-------------------+-------+
• Can get Cassandra's data
Data mapping between
Cassandra and SQL
Data mapping between Cassandra and SQL
14
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
Data mapping between Cassandra and SQL
15
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
• Table must have a primary key
– Name/type must match Cassandra's rowkey
Data mapping between Cassandra and SQL
16
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
• Table must have a primary key
– Name/type must match Cassandra's rowkey
• Columns map to Cassandra's static columns
– Name must be the same as in Cassandra
– Datatypes must match
– Can any subset of CF's columns
Datatype mapping
Cassandra MariaDB
blob BLOB, VARBINARY(n)
ascii BLOB, VARCHAR(n), use charset=latin1
text BLOB, VARCHAR(n), use charset=utf8
varint VARBINARY(n)
int INT
bigint BIGINT, TINY, SHORT
uuid CHAR(36) (text in MariaDB)
timestamp TIMESTAMP (second precision), TIMESTAMP(6) (microsecond precision),
BIGINT
boolean BOOL
float FLOAT
double DOUBLE
decimal VARBINARY(n)
counter BIGINT
• CF column datatype determines MariaDB datatype
Dynamic columns
• Cassandra supports “dynamic column families”
• Can access ad-hoc columns with MariaDB's
dynamic columns feature
18
create table tbl
(
rowkey type PRIMARY KEY
column1 type,
...
dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes
) engine=cassandra keyspace=... column_family=...;
insert into tbl values
(1, column_create('col1', 1, 'col2', 'value-2'));
select rowkey,
column_get(dynamic_cols, 'uuidcol' as char)
from tbl;
Data mapping is safe
create table t3 (pk varchar(60) primary key, no_such_field int)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not
be mapped to any field in Cassandra'
create table t3 (pk varchar(60) primary key, data1 double)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Failed to map column data1
to datatype org.apache.cassandra.db.marshal.UTF8Type'
• Cassandra SE will refuse incorrect mappings
Command mapping
Command Mapping
● Cassandra commands
– PUT (upsert)
– GET
● Scan
– DELETE (if exists)
● SQL commands
– SELECT → GET/Scan
– INSERT → PUT (upsert)
– UPDATE/DELETE → read+write.
SELECT command mapping
● MariaDB has an SQL interpreter
● Cassandra SE supports lookups and scans
● Can now do
– Arbitrary WHERE clauses
– JOINs between Cassandra tables and
MariaDB tables
● Batched Key Access is supported
DML command mapping
● No SQL semantics
– INSERT overwrites rows
– UPDATE reads, then writes
● Have you updated what you read
– DELETE reads, then deletes
● Can't be sure if/what you have deleted
● Not as bad as it sounds, it's Cassandra
– Cassandra SE doesn't make it SQL.
Cassandra SE use cases
Cassandra use cases
● Collect massive amounts
of data
– Web page hits
– Sensor updates
● Updates are naturally non-conflicting
– Keyed by UUIDs, timestamps
● Reads are served with one lookup
● Good for certain kinds of data
– Moving from SQL entirely may be difficult
Cassandra SE use cases (1)
● Send an update to Cassandra
– Be a sensor
● Grab a piece of data from Cassandra
– “This web page was last viewed by …”
– “Last known position of this user was ...”.
Access Cassandra
data from SQL
Cassandra SE use cases (2)
● Want a special table that is
– auto-replicated
– fault-tolerant
– Very fast?
● Get Cassandra, and create a
Cassandra SE table.
Coming from MySQL/MariaDB side:
Cassandra Storage Engine non-use cases
• Huge, sift-through-all-data joins
– Use Pig
• Bulk data transfer to/from Cassandra
cluster
– Use Sqoop
• A replacement for InnoDB
– No full SQL semantics
28
A “benchmark”
• One table
• EC2 environment
– m1.large nodes
– Ephemeral disks
• Stream of single-line
INSERTs
• Tried Innodb and
Cassandra
• Hardly any tuning
Conclusions
• Cassandra SE can be used to peek at
data in Cassandra from MariaDB.
• It is not a replacement for Pig/Hive
• It is really easy to setup and use
30
Going Forward
• Looking for input
• Do you want support for
– Fast counter columns updates?
– Awareness of Cassandra cluster
topology?
– Secondary indexes?
– …?
31
Resources
• https://kb.askmonty.org/en/cassandrase/
• http://wiki.apache.org/cassandra/DataModel
• http://cassandra.apache.org/
• http://www.datastax.com/docs/1.1/ddl/column_family
32
Thanks!
33
Q & A
Extra: Cassandra SE internals
• Developed against Cassandra 1.1
• Uses Thrift API
– cannot stream CQL resultset in 1.1
– Cant use secondary indexes
• Only supports AllowAllAuthenticator
• In Cassandra 1.2
– “CQL Binary Protocol” with streaming
– CASSANDRA-5234: Thrift can only read CFs
“WITH COMPACT STORAGE”
34

More Related Content

What's hot

ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemSergey Petrunya
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerSergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013Sergey Petrunya
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysqlsabir18
 
M|18 Understanding the Query Optimizer
M|18 Understanding the Query OptimizerM|18 Understanding the Query Optimizer
M|18 Understanding the Query OptimizerMariaDB plc
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQLEvan Weaver
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZENorvald Ryeng
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0Mydbops
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cMauro Pagano
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAnju Garg
 
SQL Plan Directives explained
SQL Plan Directives explainedSQL Plan Directives explained
SQL Plan Directives explainedMauro Pagano
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Mydbops
 
Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015mushupl
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOpsSveta Smirnova
 

What's hot (20)

MySQL SQL Tutorial
MySQL SQL TutorialMySQL SQL Tutorial
MySQL SQL Tutorial
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query Optimizer
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysql
 
M|18 Understanding the Query Optimizer
M|18 Understanding the Query OptimizerM|18 Understanding the Query Optimizer
M|18 Understanding the Query Optimizer
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2Fosdem2012 mariadb-5.3-query-optimizer-r2
Fosdem2012 mariadb-5.3-query-optimizer-r2
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQL
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
 
SQL Plan Directives explained
SQL Plan Directives explainedSQL Plan Directives explained
SQL Plan Directives explained
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 

Similar to Cassandra Storage Engine in MariaDB: An Introduction

MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityColin Charles
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developersColin Charles
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)Colin Charles
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin CharlesInsight Technology, Inc.
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
 
DataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDatabricks
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japanHiromitsu Komatsu
 

Similar to Cassandra Storage Engine in MariaDB: An Introduction (20)

MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developers
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
DataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New World
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Cassandra at BrightTag
Cassandra at BrightTagCassandra at BrightTag
Cassandra at BrightTag
 
NoSQL Session II
NoSQL Session IINoSQL Session II
NoSQL Session II
 

More from Sergey Petrunya

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLSergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBSergey Petrunya
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.Sergey Petrunya
 
Window functions in MariaDB 10.2
Window functions in MariaDB 10.2Window functions in MariaDB 10.2
Window functions in MariaDB 10.2Sergey Petrunya
 
MyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBMyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBSergey Petrunya
 
MariaDB: ANALYZE for statements (lightning talk)
MariaDB:  ANALYZE for statements (lightning talk)MariaDB:  ANALYZE for statements (lightning talk)
MariaDB: ANALYZE for statements (lightning talk)Sergey Petrunya
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 

More from Sergey Petrunya (19)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
 
Window functions in MariaDB 10.2
Window functions in MariaDB 10.2Window functions in MariaDB 10.2
Window functions in MariaDB 10.2
 
MyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBMyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDB
 
MariaDB: ANALYZE for statements (lightning talk)
MariaDB:  ANALYZE for statements (lightning talk)MariaDB:  ANALYZE for statements (lightning talk)
MariaDB: ANALYZE for statements (lightning talk)
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Cassandra Storage Engine in MariaDB: An Introduction

  • 1. Cassandra Storage Engine in MariaDB MariaDB Cassandra interoperability Sergei Petrunia Colin Charles
  • 2. Who are we ● Sergei Petrunia – Principal developer of CassandraSE, optimizer developer, formerly from MySQL – psergey@mariadb.org ● Colin Charles – Chief Evangelist, MariaDB, formerly from MySQL – colin@mariadb.org
  • 3. Agenda ● An introduction to Cassandra ● The Cassandra Storage Engine (Cassandra SE) ● Data mapping ● Use cases ● Benchmarks ● Conclusion
  • 4. Background: what is Cassandra • A distributed NoSQL database – Key-Value store ● Limited range scan suppor – Optionally flexible schema ● Pre-defined “static” columns ● Ad-hoc “dynamic” columns – Automatic sharding / replication – Eventual consistency 4
  • 5. Background: Cassandra's data model • “Column families” like tables • Row key → columns • Somewhat similar to SQL but some important differences. • Supercolumns are not supported 5
  • 6. CQL – Cassandra Query Language Looks like SQL at first glance 6 bash$ cqlsh -3 cqlsh> CREATE KEYSPACE mariadbtest ... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1}; cqlsh> use mariadbtest; cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key, ... data1 varchar, data2 bigint ... ) with compact storage; cqlsh:mariadbtest> insert into cf1 (pk, data1,data2) ... values ('row1', 'data-in-cassandra', 1234); cqlsh:mariadbtest> select * from cf1; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234
  • 7. CQL is not SQL Similarity with SQL is superficial 7 cqlsh:mariadbtest> select * from cf1 where pk='row1'; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 cqlsh:mariadbtest> select * from cf1 where data2=1234; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2'; Bad Request: line 1:34 missing EOF at 'or' • No joins or subqueries • No GROUP BY, ORDER BY must be able to use available indexes • WHERE clause must represent an index lookup.
  • 8. Cassandra Storage Engine 8 Provides a “view” of Cassandra's data from MariaDB. Starts a NoCQL movement
  • 9. 1. Load the Cassandra SE plugin • Get MariaDB 10.0.1+ • Load the Cassandra plugin – From SQL: 9 MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so'; [mysqld] ... plugin-load=ha_cassandra.so – Or, add a line to my.cnf: MariaDB [(none)]> show plugins; +--------------------+--------+-----------------+-----------------+---------+ | Name | Status | Type | Library | License | +--------------------+--------+-----------------+-----------------+---------+ ... | CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL | +--------------------+--------+-----------------+-----------------+---------+ • Check it is loaded
  • 10. 2. Connect to Cassandra • Create an SQL table which is a view of a column family 10 MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113'; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> create table t2 (pk varchar(36) primary key, -> data1 varchar(60), -> data2 bigint -> ) engine=cassandra -> keyspace='mariadbtest' -> thrift_host='10.196.2.113' -> column_family='cf1'; Query OK, 0 rows affected (0.01 sec) – thrift_host can be set per-table – @@cassandra_default_thrift_host allows to ● Re-point the table to different node dynamically ● Not change table DDL when Cassandra IP changes.
  • 11. Possible gotchas 11 • SELinux blocks the connection MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ; ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1] MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ; ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in keyspace mariadbtest • Cassandra 1.2 and CFs without “COMPACT STORAGE” – Packaging bug – To get running quickly: echo 0 >/selinux/enforce – Caused by a change in Cassandra 1.2 – They broke Pig also – We intend to update CassandraSE for 1.2
  • 12. Accessing Cassandra data ● Can insert data 12 MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123); Query OK, 1 row affected (0.00 sec) cqlsh:mariadbtest> select * from cf1; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 row2 | data-from-mariadb | 123 • Cassandra sees inserted data MariaDB [test]> select * from t2; +------+-------------------+-------+ | pk | data1 | data2 | +------+-------------------+-------+ | row1 | data-in-cassandra | 1234 | +------+-------------------+-------+ • Can get Cassandra's data
  • 14. Data mapping between Cassandra and SQL 14 create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' • MariaDB table represents Cassandra's Column Family – Can use any table name, column_family=... specifies CF.
  • 15. Data mapping between Cassandra and SQL 15 create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' • MariaDB table represents Cassandra's Column Family – Can use any table name, column_family=... specifies CF. • Table must have a primary key – Name/type must match Cassandra's rowkey
  • 16. Data mapping between Cassandra and SQL 16 create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' • MariaDB table represents Cassandra's Column Family – Can use any table name, column_family=... specifies CF. • Table must have a primary key – Name/type must match Cassandra's rowkey • Columns map to Cassandra's static columns – Name must be the same as in Cassandra – Datatypes must match – Can any subset of CF's columns
  • 17. Datatype mapping Cassandra MariaDB blob BLOB, VARBINARY(n) ascii BLOB, VARCHAR(n), use charset=latin1 text BLOB, VARCHAR(n), use charset=utf8 varint VARBINARY(n) int INT bigint BIGINT, TINY, SHORT uuid CHAR(36) (text in MariaDB) timestamp TIMESTAMP (second precision), TIMESTAMP(6) (microsecond precision), BIGINT boolean BOOL float FLOAT double DOUBLE decimal VARBINARY(n) counter BIGINT • CF column datatype determines MariaDB datatype
  • 18. Dynamic columns • Cassandra supports “dynamic column families” • Can access ad-hoc columns with MariaDB's dynamic columns feature 18 create table tbl ( rowkey type PRIMARY KEY column1 type, ... dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes ) engine=cassandra keyspace=... column_family=...; insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2')); select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;
  • 19. Data mapping is safe create table t3 (pk varchar(60) primary key, no_such_field int) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra' create table t3 (pk varchar(60) primary key, data1 double) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.UTF8Type' • Cassandra SE will refuse incorrect mappings
  • 21. Command Mapping ● Cassandra commands – PUT (upsert) – GET ● Scan – DELETE (if exists) ● SQL commands – SELECT → GET/Scan – INSERT → PUT (upsert) – UPDATE/DELETE → read+write.
  • 22. SELECT command mapping ● MariaDB has an SQL interpreter ● Cassandra SE supports lookups and scans ● Can now do – Arbitrary WHERE clauses – JOINs between Cassandra tables and MariaDB tables ● Batched Key Access is supported
  • 23. DML command mapping ● No SQL semantics – INSERT overwrites rows – UPDATE reads, then writes ● Have you updated what you read – DELETE reads, then deletes ● Can't be sure if/what you have deleted ● Not as bad as it sounds, it's Cassandra – Cassandra SE doesn't make it SQL.
  • 25. Cassandra use cases ● Collect massive amounts of data – Web page hits – Sensor updates ● Updates are naturally non-conflicting – Keyed by UUIDs, timestamps ● Reads are served with one lookup ● Good for certain kinds of data – Moving from SQL entirely may be difficult
  • 26. Cassandra SE use cases (1) ● Send an update to Cassandra – Be a sensor ● Grab a piece of data from Cassandra – “This web page was last viewed by …” – “Last known position of this user was ...”. Access Cassandra data from SQL
  • 27. Cassandra SE use cases (2) ● Want a special table that is – auto-replicated – fault-tolerant – Very fast? ● Get Cassandra, and create a Cassandra SE table. Coming from MySQL/MariaDB side:
  • 28. Cassandra Storage Engine non-use cases • Huge, sift-through-all-data joins – Use Pig • Bulk data transfer to/from Cassandra cluster – Use Sqoop • A replacement for InnoDB – No full SQL semantics 28
  • 29. A “benchmark” • One table • EC2 environment – m1.large nodes – Ephemeral disks • Stream of single-line INSERTs • Tried Innodb and Cassandra • Hardly any tuning
  • 30. Conclusions • Cassandra SE can be used to peek at data in Cassandra from MariaDB. • It is not a replacement for Pig/Hive • It is really easy to setup and use 30
  • 31. Going Forward • Looking for input • Do you want support for – Fast counter columns updates? – Awareness of Cassandra cluster topology? – Secondary indexes? – …? 31
  • 32. Resources • https://kb.askmonty.org/en/cassandrase/ • http://wiki.apache.org/cassandra/DataModel • http://cassandra.apache.org/ • http://www.datastax.com/docs/1.1/ddl/column_family 32
  • 34. Extra: Cassandra SE internals • Developed against Cassandra 1.1 • Uses Thrift API – cannot stream CQL resultset in 1.1 – Cant use secondary indexes • Only supports AllowAllAuthenticator • In Cassandra 1.2 – “CQL Binary Protocol” with streaming – CASSANDRA-5234: Thrift can only read CFs “WITH COMPACT STORAGE” 34