2. @doanduyhai
Who Am I ?
Duy Hai DOAN
Apache Cassandra Evangelist
• talks, meetups, confs …
• open-source projects (Achilles, Apache Zeppelin ...)
• OSS Cassandra point of contact
•
☞ duy_hai.doan@datastax.com
☞ @doanduyhai
2
3. @doanduyhai
Datastax
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 400+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
6. @doanduyhai
Why Materialized Views ?
• Relieve the pain of manual denormalization
CREATE TABLE user(id int PRIMARY KEY, country text, …);
CREATE TABLE user_by_country( country text, id int, …,
PRIMARY KEY(country, id));
6
7. @doanduyhai
CREATE TABLE user_by_country (
country text, id int,
firstname text, lastname text,
PRIMARY KEY(country, id));
Materialzed View In Action
CREATE MATERIALIZED VIEW user_by_country
AS SELECT country, id, firstname, lastname
FROM user
WHERE country IS NOT NULL AND id IS NOT NULL
PRIMARY KEY(country, id)
7
9. @doanduyhai
Materialized View Performance
• Write performance
• slower than normal write
• local lock + read-before-write cost (but paid only once for all views)
• for each base table update, worst case: mv_count x 2 (DELETE + INSERT) extra
mutations for the views
9
10. @doanduyhai
Materialized View Performance
• Write performance vs manual denormalization
• MV better because no client-server network traffic for read-before-write
• MV better because less network traffic for multiple views (client-side BATCH)
• Makes developer life easier à priceless
10
11. @doanduyhai
Materialized View Performance
• Read performance vs secondary index
• MV better because single node read (secondary index can hit many nodes)
• MV better because single read path (secondary index = read index + read data)
11
12. @doanduyhai
Materialized Views Consistency
• Consistency level
• CL honoured for base table, ONE for MV + local batchlog
• Weaker consistency guarantees for MV than for base table.
12
15. @doanduyhai
Rationale
• Push computation server-side
• save network bandwidth (1000 nodes!)
• simplify client-side code
• provide standard & useful function (sum, avg …)
• accelerate analytics use-case (pre-aggregation for Spark)
15
16. @doanduyhai
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURNS returnType
LANGUAGE language
AS $$
// source code here
$$;
16
17. @doanduyhai
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURNS returnType
LANGUAGE language
AS $$
// source code here
$$;
Param name to refer to in the code
Type = Cassandra type
17
18. @doanduyhai
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURNS returnType
LANGUAGE language // j
AS $$
// source code here
$$;
Always called
Null-check mandatory in code
18
19. @doanduyhai
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURNS returnType
LANGUAGE language // jav
AS $$
// source code here
$$;
If any input is null, function execution is skipped and return null
19
20. @doanduyhai
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURNS returnType
LANGUAGE language
AS $$
// source code here
$$;
Cassandra types
• primitives (boolean, int, …)
• collections (list, set, map)
• tuples
• UDT
20
21. @doanduyhai
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURNS returnType
LANGUAGE language
AS $$
// source code here
$$;
JVM supported languages
• Java, Scala
• Javascript (slow)
• Groovy, Jython, JRuby
• Clojure ( JSR 223 impl issue)
21
23. @doanduyhai
User Defined Aggregates (UDA)
• Real use-case for UDF
• Aggregation server-side à huge network bandwidth saving
• Provide similar behavior for Group By, Sum, Avg etc …
23
24. @doanduyhai
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
Only type, no param name
State type
Initial state type
24
25. @doanduyhai
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
Accumulator function. Signature:
accumulatorFunction(stateType, type1, type2, …)
RETURNS stateType
25
26. @doanduyhai
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
Optional final function. Signature:
finalFunction(stateType)
26
28. @doanduyhai
Gotchas
28
• UDA in Cassandra is not distributed !
• Do not execute UDA on a large number of rows (106 for ex.)
• single fat partition
• multiple partitions
• full table scan
• à Increase client-side timeout
• default Java driver timeout = 12 secs
29. @doanduyhai
Cassandra UDA or Apache Spark ?
29
Consistency
Level
Single/Multiple
Partition(s)
Recommended
Approach
ONE Single partition UDA with token-aware driver because node local
ONE Multiple partitions Apache Spark because distributed reads
> ONE Single partition UDA because data-locality lost with Spark
> ONE Multiple partitions Apache Spark definitely
32. @doanduyhai
Why JSON ?
32
• JSON is a very good exchange format
• But a terrible schema …
• How to have best of both worlds ?
• use Cassandra schema
• convert rows to JSON format
33. @doanduyhai
JSON syntax for INSERT/UPDATE/DELETE
33
CREATE TABLE users (
id text PRIMARY KEY,
age int,
state text );
INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}’;
INSERT INTO users(id, age, state) VALUES('me', fromJson('20'), 'CA');
UPDATE users SET age = fromJson('25’) WHERE id = fromJson('"me"');
DELETE FROM users WHERE id = fromJson('"me"');
34. @doanduyhai
JSON syntax for SELECT
34
> SELECT JSON * FROM users WHERE id = 'me';
[json]
----------------------------------------
{"id": "me", "age": 25, "state": "CA”}
> SELECT JSON age,state FROM users WHERE id = 'me';
[json]
----------------------------------------
{"age": 25, "state": "CA"}
> SELECT age, toJson(state) FROM users WHERE id = 'me';
age | system.tojson(state)
-----+----------------------
25 | "CA"
38. @doanduyhai
Why SASI ?
• Searching (and full text search) was always a pain point for Cassandra
• limited search predicates (=, <=, <, > and >= only)
• limited scope (only on primary key columns)
• Existing secondary index performance is poor
• reversed-index
• use Cassandra itself as index storage …
• limited predicate ( = ). Inequality predicate = full cluster scan 😱
38
39. @doanduyhai
How ?
• New index structure = suffix trees
• Extended predicates (=, inequalities, LIKE %)
• Full text search (tokenizers, stop-words, stemming …)
• Query Planner to optimize AND predicates
• NO, we don’t use Apache Lucene
39
42. @doanduyhai
When ?
• Cassandra 3.5
• Later
• support for OR clause : ( aaa OR bbb) AND (ccc OR ddd)
• index on collections (Set, List, Map)
42
43. @doanduyhai
Comparison
43
SASI vs Solr/ElasticSearch ?
• Cassandra is not a search engine !!! (database = durability)
• always slower because 2 passes (SASI index read + original Cassandra data)
• no scoring
• no ordering (ORDER BY)
• no grouping (GROUP BY) à Apache Spark for analytics
Still, SASI covers 80% of search use-cases and people are happy !