2. @doanduyhai
Who Am I ?!
Duy Hai DOAN
Cassandra technical advocate
• talks, meetups, confs
• open-source devs (Achilles, …)
• OSS Cassandra point of contact
☞ duy_hai.doan@datastax.com
☞ @doanduyhai
2
3. @doanduyhai
Datastax!
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 400+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
5. @doanduyhai
What is Apache Spark ?!
Created at
Apache Project since 2010
General data processing framework
Faster than Hadoop, in memory
One-framework-many-components approach
5
15. @doanduyhai
What is Apache Cassandra?!
Created at
Apache Project since 2009
Distributed NoSQL database
Eventual consistency (A & P of the CAP theorem)
Distributed table abstraction
15
16. @doanduyhai
Cassandra data distribution reminder!
Random: hash of #partition → token = hash(#p)
Hash: ]-X, X]
X = huge number (264/2)
n1
n2
n3
n4
n5
n6
n7
n8
16
17. @doanduyhai
Cassandra token ranges!
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
Murmur3 hash function
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
17
20. @doanduyhai
Cassandra Query Language (CQL)!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
DELETE age FROM users WHERE login = ‘jdoe’;
SELECT age FROM users WHERE login = ‘jdoe’;
20
23. @doanduyhai
Connector architecture – Core API!
Cassandra tables exposed as Spark RDDs
Read from and write to Cassandra
Mapping of C* tables and rows to Scala objects
• CassandraRDD and CassandraRow
• Scala case class (object mapper)
• Scala tuples
23
25. @doanduyhai
Connector architecture – Spark SQL !
Mapping of Cassandra table to SchemaRDD
• CassandraSQLRow à SparkRow
• custom query plan
• push predicates to CQL for early filtering
SELECT * FROM user_emails WHERE login = ‘jdoe’;
25
27. @doanduyhai
Connector architecture – Spark Streaming !
Streaming data INTO Cassandra table
• trivial setup
• be careful about your Cassandra data model when having an infinite
stream !!!
Streaming data OUT of Cassandra tables (CDC) ?
• work in progress …
27
33. @doanduyhai
Cassandra – Spark job lifecycle!
Spark Worker Spark Worker Spark Worker Spark Worker
C* C* C* C*
33
Spark Master
Spark Client
Driver Program
Spark Context
1
D e f i n e y o u r
business logic
here !
43. @doanduyhai
Write to Cassandra without data locality!
Async batches fan-out writes to Cassandra
Because of shuffle, original data locality is lost
43
44. @doanduyhai
Write to Cassandra with data locality!
Write to Cassandra
rdd.repartitionByCassandraReplica("keyspace","table")
44
45. @doanduyhai
Write data locality!
• either stream data in Spark layer using repartitionByCassandraReplica()
• or flush data to Cassandra by async batches
• in any case, there will be data movement on network (sorry no magic)
45
46. @doanduyhai
Joins with data locality!
CREATE TABLE artists(name text, style text, … PRIMARY KEY(name));
CREATE TABLE albums(title text, artist text, year int,… PRIMARY KEY(title));
val join: CassandraJoinRDD[(String,Int), (String,String)] =
sc.cassandraTable[(String,Int)](KEYSPACE, ALBUMS)
// Select only useful columns for join and processing
.select("artist","year")
.as((_:String, _:Int))
// Repartition RDDs by "artists" PK, which is "name"
.repartitionByCassandraReplica(KEYSPACE, ARTISTS)
// Join with "artists" table, selecting only "name" and "country" columns
.joinWithCassandraTable[(String,String)](KEYSPACE, ARTISTS, SomeColumns("name","country"))
.on(SomeColumns("name"))
46
54. @doanduyhai
Perfect data locality scenario!
• read localy from Cassandra
• use operations that do not require shuffle in Spark (map, filter, …)
• repartitionbyCassandraReplica()
• à to a table having same partition key as original table
• save back into this Cassandra table
Sanitize, validate, normalize, transform data
USE CASE
54
57. @doanduyhai
Failure handling!
What if 1 node down ?
What if 1 node overloaded ?
☞ Spark master will re-assign
the job to another worker
C*
SparkM
SparkW
C*
SparkW
C*
SparkW
C*
SparkW
C*
SparkW
57
62. @doanduyhai
Failure handling!
If RF > 1 the Spark master choses
the next preferred location, which
is a replica 😎
Tune parameters:
① spark.locality.wait
② spark.locality.wait.process
③ spark.locality.wait.node
C*
SparkM
SparkW
C*
SparkW
C*
SparkW
C*
SparkW
C*
SparkW
62
63. @doanduyhai
val confDC1 = new SparkConf(true)
.setAppName("data_migration")
.setMaster("master_ip")
.set("spark.cassandra.connection.host", "DC_1_hostnames")
.set("spark.cassandra.connection.local_dc", "DC_1")
val confDC2 = new SparkConf(true)
.setAppName("data_migration")
.setMaster("master_ip")
.set("spark.cassandra.connection.host", "DC_2_hostnames")
.set("spark.cassandra.connection.local_dc", "DC_2 ")
val sc = new SparkContext(confDC1)
sc.cassandraTable[Performer](KEYSPACE,PERFORMERS)
.map[Performer](???)
.saveToCassandra(KEYSPACE,PERFORMERS)
(CassandraConnector(confDC2),implicitly[RowWriterFactory[Performer]])
Cross-DC operations!
63
64. @doanduyhai
val confCluster1 = new SparkConf(true)
.setAppName("data_migration")
.setMaster("master_ip")
.set("spark.cassandra.connection.host", "cluster_1_hostnames")
val confCluster2 = new SparkConf(true)
.setAppName("data_migration")
.setMaster("master_ip")
.set("spark.cassandra.connection.host", "cluster_2_hostnames")
val sc = new SparkContext(confCluster1)
sc.cassandraTable[Performer](KEYSPACE,PERFORMERS)
.map[Performer](???)
.saveToCassandra(KEYSPACE,PERFORMERS)
(CassandraConnector(confCluster2),implicitly[RowWriterFactory[Performer]])
Cross-cluster operations!
64
66. @doanduyhai
Use Cases!
Load data from various
sources
Analytics (join, aggregate, transform, …)
Sanitize, validate, normalize, transform data
Schema migration,
Data conversion
66
67. @doanduyhai
Data cleaning use-cases!
Bug in your application ?
Dirty input data ?
☞ Spark job to clean it up! (perfect data locality)
Sanitize, validate, normalize, transform data
67
71. @doanduyhai
Analytics use-cases!
Given existing tables of performers and albums, I want:
① top 10 most common music styles (pop,rock, RnB, …) ?
② performer productivity(albums count) by origin country and by
decade ?
☞ Spark job to compute analytics !
Analytics (join, aggregate, transform, …)
71
72. @doanduyhai
Analytics pipeline!
① Read from production transactional tables
② Perform aggregation with Spark
③ Save back data into dedicated tables for fast visualization
④ Repeat step ①
72
78. @doanduyhai
Spark memory config!
78
Spark master is lightweight, few resources requirement
Spark workers & executors require much more memory
Spark memory computed based on settings in dse.yaml:
initial_spark_worker_resources: 0.7
79. @doanduyhai
Spark memory config!
79
Can be manually overriden in spark-env.sh :
# The amount of memory used by Spark Worker, cluster-wide
export SPARK_WORKER_MEMORY= "4096M”
# The amount of memory used by one Spark node
export SPARK_MEM="1024M”
# The amount of memory used by Spark Shell (client)
export SPARK_REPL_MEM="256M”
80. @doanduyhai
Spark CPU (cores) config!
80
Spark cores computed based on settings in dse.yaml:
initial_spark_worker_resources: 0.7
81. @doanduyhai
Spark CPU (cores) config!
81
Can be manually overriden in spark-env.sh :
# Set the number of cores used by Spark Worker
export SPARK_WORKER_CORES="2"
# The default number of cores assigned to each application.
# It can be modified in a particular application.
# If left blank, each Spark application will consume all
# available cores in the cluster.
export DEFAULT_PER_APP_CORES="3"
82. @doanduyhai
Important architecture considerations!
82
In theory
• 1 Spark job is run inside 1 Spark executor
• 1 Spark worker can have 1..N executors (1..N jobs)
In stand-alone mode
• 1 Spark worker cannot have more than 1 executor (see SPARK-1607)
• change SPARK_WORKER_INSTANCES in spark-env.sh
83. @doanduyhai
Important resources considerations!
83
Memory
• the more the better for Spark (beware of JVM GC cycles)
• let enough memory for OS page cache (used by Cassandra)
Cores
• beware of context switching on highly contended load
• 1 Spark DStream = 1 core used !
General resources
• if not enough resources for all jobs à schedule them!
84. The “Big Data Platform”!
Analytics & Search!
Integrated High Availability!
85. @doanduyhai
Our vision!
We had a dream …
to provide an integrated Big Data Platform …
built for the performance & high availabitly demands
of IoT, web & mobile applications
85
86. @doanduyhai
Datastax Enterprise!
Cassandra + Solr in same JVM
Unlock full text search power for Cassandra
CQL syntax extension
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
86
91. @doanduyhai
How to speed up Spark jobs ?!
Bruce force solution for rich people:
• Add more RAM (100Gb, 200Gb, …)
91
92. @doanduyhai
How to speed up Spark jobs ?!
Smart solution for broke people:
• Filter at maximum the initial dataset
92
93. @doanduyhai
The idea!
① Filter the maximum with DSE Search
② Fetch only small data set in memory
③ Aggregate with Spark
☞ near real time interactive analytics query possible if restrictive criteria
93
98. @doanduyhai
Stand-alone search cluster caveat!
The ops won’t be your friend
• 3 clusters to manage: Spark, Cassandra & Solr/whatever
• lots of moving parts
Impacts on Spark jobs
• increased response time due to latency
• the 99.9 percentile latency can be very high
98
100. @doanduyhai
val join: CassandraJoinRDD[(String,Int), (String,String)] =
sc.cassandraTable[(String,Int)](KEYSPACE, ALBUMS)
// Select only useful columns for join and processing
.select("artist","year").where("solr_query = 'style:*rock* AND ratings:[3 TO *]' ")
.as((_:String, _:Int))
.repartitionByCassandraReplica(KEYSPACE, ARTISTS)
.joinWithCassandraTable[(String,String)](KEYSPACE, ARTISTS, SomeColumns("name","country"))
.on(SomeColumns("name")).where("solr_query = 'age:[20 TO 30]' ")
Datastax Enterprise 4.7!
① compute Spark partitions using Cassandra token ranges
② on each partition, use Solr for local data filtering (no distributed query!)
③ fetch data back into Spark for aggregations
100
101. @doanduyhai
Datastax Enterprise 4.7!
SELECT … FROM …
WHERE token(#partition)> 3X/8
AND token(#partition)<= 4X/8
AND solr_query='full text search expression';
1
2
3
Advantages of same JVM Cassandra + Solr integration
1
Single-pass local full text search (no fan out) 2
Data retrieval
Token Range : ] 3X/8, 4X/8]
101
105. @doanduyhai
Integrated High Availability!
DSE 4.7 will feature a new general Leader election framework
• using Cassandra robust Gossip protocol
• using Cassandra battlefield-proven Phi Accrual failure detector
• using Cassandra LightWeight Transaction(Paxos) primitive for leader
election
105
106. @doanduyhai
Integrated High Availability!
Applied to Spark
• start a Spark master process inside DSE
• use Cassandra Gossip & Failure detector to detect Spark master state
• use Cassandra LightWeight Transaction for robust master election
106