SlideShare a Scribd company logo
Big Data Analytics with
Cassandra & Spark_
Matthias Niehoff
Cassandra
2
•Distributed database
•Highly Available
•Linear Scalable
•Multi Datacenter Support
•No Single Point Of Failure
•CQL Query Language
• Similiar to SQL
• No Joins and aggregates
•Eventual Consistency „Tunable Consistency“
Cassandra_
3
Distributed Data Storage_
4
Node 1
Node 2
Node 3
Node 4
1-25
26-5051-75
76-0
CQL - Querying Language With Limitations_
5
SELECT	*	FROM	performer	WHERE	name	=	'ACDC'	
—>	ok	
SELECT	*	FROM	performer	WHERE	name	=	'ACDC'	and	country	=	
'Australia'	
—>	not	ok	
SELECT	country,	COUNT(*)	as	quantity	FROM	artists	GROUP	BY	
country	ORDER	BY	quantity	DESC	
—>	not	supported	
performer
name (PK)
genre
country
Spark
6
•Open Source & Apache project since 2010
•Data processing Framework
• Batch processing
• Stream processing
What Is Apache Spark_
7
•Fast
• up to 100 times faster than Hadoop
• a lot of in-memory processing
• linear scalable using more nodes
•Easy
• Scala, Java and Python API
• Clean Code (e.g. with lambdas in Java 8)
• expanded API: map, reduce, filter, groupBy, sort, union, join,
reduceByKey, groupByKey, sample, take, first, count
•Fault-Tolerant
• easily reproducible
Why Use Spark_
8
•RDD‘s – Resilient Distributed Dataset
• Read–Only description of a collection of objects
• Distributed among the cluster (on memory or disk)
• Determined through transformations
• Allows automatically rebuild on failure
•Operations
• Transformations (map,filter,reduce...) —> new RDD
• Actions (count, collect, save)
•Only Actions start processing!
Easily Reproducable?_
9
RDD Example_
10
scala>	val	textFile	=	sc.textFile("README.md")	
textFile:	spark.RDD[String]	=	spark.MappedRDD@2ee9b6e3	
scala>	val	linesWithSpark	=	textFile.filter(line	=>	
line.contains("Spark"))	
linesWithSpark:	spark.RDD[String]	=	spark.FilteredRDD@7dd4af09	
scala>	linesWithSpark.count()		
res0:	Long	=	126
Reproduce RDD’s Using A DAG_
11
Datenquelle
rdd1
rdd3
val1 rdd5
rdd2
rdd4
val2
rdd6
val3
map(..)filter(..)
union(..)
count()
count() count()
sample(..)
cache()
Run Spark In A Cluster_
12
([atomic,collection,object]	,	[atomic,collection,object])	
val	fluege	=		
List(	("Thomas",	"Berlin"),("Mark",	"Paris"),("Thomas",	"Madrid"))	
val	pairRDD	=	sc.parallelize(fluege)	
pairRDD.filter(_._1	==	"Thomas")	
.collect	
.foreach(t	=>	println(t._1	+	"	flew	to	"	+	t._2))	
Pair RDDs_
13
key – not unique value
•Parallelization!
• keys are use for partitioning
• pairs with different keys are distributed across the cluster
•Efficient processing of
• aggregate by key
• group by key
• sort by key
• joins, union based on keys
Why Use Pair RDD’s_
14
Spark & Cassandra
15
Use Spark And Cassandra In A Cluster_
16
Spark	

Client
Spark
Driver
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
Two Datacenter - Two Purposes_
17
C*
C*
C*C*
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
DC1 - Online DC2 - Analytics
•Spark Cassandra Connector by Datastax
• https://github.com/datastax/spark-cassandra-connector
•Cassandra tables as Spark RDD (read & write)
•Mapping of C* tables and rows onto Java/Scala objects
•Server-Side filtering („where“)
•Compatible with
• Spark ≥ 0.9
• Cassandra ≥ 2.0
•Clone & Compile with SBT or download at Maven Central
Connecting Spark With Cassandra_
18
•Start the shell
bin/spark-shell	

--jars	~/path/to/jar/spark-cassandra-connector-
assembly-1.3.0.jar	

--conf	spark.cassandra.connection.host=localhost	
--driver-memory	2g	
--executor-memory	3g	
•Import	Cassandra	Classes	
scala>	import	com.datastax.spark.connector._	
Use The Connector In The Shell_
19
•Read complete table
val	movies	=	sc.cassandraTable("movie","movies")	
//	returns	CassandraRDD[CassandraRow]	
•Read selected columns
val	movies	=	sc.cassandraTable("movie","movies").select("title","year")	
•Filter rows
val	movies	=	sc.cassandraTable("movie","movies").where("title	=	'Die	
Hard'")	
•Access Columns in Result Set
movies.collect.foreach(r	=>	println(r.get[String]("title")))	
Read A Cassandra Table_
20
Read As Tuple
val	movies	=	

sc.cassandraTable[(String,Int)]("movie","movies")

.select("title","year")	
val	movies	=	

sc.cassandraTable("movie","movies")

.select("title","year")

.as((_:	String,	_:Int))	
//	both	result	in	a	CassandraRDD[(String,Int)]	
Read A Cassandra Table_
21
Read As Case Class
case	class	Movie(title:	String,	year:	Int)	
sc.cassandraTable[Movie]("movie","movies").select("title","year")	
sc.cassandraTable("movie","movies").select("title","year").as(Movie)	
Read A Cassandra Table_
22
•Every RDD can be saved
• Using Tuples
val	tuples	=	sc.parallelize(Seq(("Hobbit",2012),("96	Hours",2008)))		
tuples.saveToCassandra("movie","movies",	SomeColumns("title","year")	
• Using Case Classes
case	class	Movie	(title:String,	year:	int)

val	objects	=		
sc.parallelize(Seq(Movie("Hobbit",2012),Movie("96	Hours",2008)))		
objects.saveToCassandra("movie","movies")	
Write Table_
23
//	Load	and	format	as	Pair	RDD	
val	pairRDD	=	sc.cassandraTable("movie","director")	
.map(r	=>	(r.getString("country"),r))	
//	Directors	/	Country,	sorted	
pairRDD.mapValues(v	=>	1).reduceByKey(_+_)	
.sortBy(-_._2).collect.foreach(println)	
//	or,	unsorted	
pairRDD.countByKey().foreach(println)	
//	All	Countries	
pairRDD.keys()	
Pair RDDs With Cassandra_
24
director
name text K
country text
•Joins can be expensive as they may require shuffling
val	directors	=	sc.cassandraTable(..)

.map(r	=>	(r.getString("name"),r))	
val	movies	=	sc.cassandraTable()

.map(r	=>	(r.getString("director"),r))	
movies.join(directors)	
//	RDD[(String,	(CassandraRow,	CassandraRow))]	
Pair RDDs With Cassandra - Join
25
director
name text K
country text
movie
title text K
director text
•Automatically on read
•Not automatically on write
• No Shuffling Spark Operations -> Writes are local
• Shuffeling Spark Operartions
• Fan Out writes to Cassandra
• repartitionByCassandraReplica(“keyspace“, “table“) before write
•Joins with data locality
Using Data Locality With Cassandra_
26
sc.cassandraTable[CassandraRow](KEYSPACE,	A)	
.repartitionByCassandraReplica(KEYSPACE,	B)	
.joinWithCassandraTable[CassandraRow](KEYSPACE,	B)

.on(SomeColumns("id"))
•cassandraCount()
• Utilizes Cassandra query
• vs load the table into memory and do a count
•spanBy(), spanByKey()
• group data by Cassandra partition key
• does not need shuffling
• should be preferred over groupBy/groupByKey
CREATE TABLE events (year int, month int, ts timestamp, data
varchar, PRIMARY KEY (year,month,ts));
sc.cassandraTable("test",	"events")	
		.spanBy(row	=>	(row.getInt("year"),	row.getInt("month")))	
sc.cassandraTable("test",	"events")	
		.keyBy(row	=>	(row.getInt("year"),	row.getInt("month")))	
		.spanByKey	
Further Transformations & Actions_
27
Spark SQL
28
•SQL Queries with Spark (SQL & HiveQL)
• On structured data
• On DataFrame
• Every result of Spark SQL is a DataFrame
• All operations of the GenericRDD‘s available
•Supports (even on non primary key columns)
• Joins
• Union
• Group By
• Having
• Order By
Spark SQL_
29
val	sqlContext	=	new	SQLContext(sc)	
val	persons	=	sqlContext.jsonFile(path)	
//	Show	the	schema	
persons.printSchema()	
persons.registerTempTable("persons")	
val	adults	=		
sqlContext.sql("SELECT	name	FROM	persons	WHERE	age	>	18")	
adults.collect.foreach(println)	
Spark SQL - JSON Example_
30
{"name":"Michael"}	
{"name":"Jan",	"age":30}	
{"name":"Tim",	"age":17}
val	csc	=	new	CassandraSQLContext(sc)	
csc.setKeyspace("musicdb")	
val	result	=	csc.sql("SELECT	country,	COUNT(*)	as	anzahl"	+	
	 	 	 				"FROM	artists	GROUP	BY	country"	+	
	 	 	 				"ORDER	BY	anzahl	DESC");	
result.collect.foreach(println);	
Spark SQL - Cassandra Example_
31
Spark Streaming
32
•Real Time Processing using micro batches
•Supported sources: TCP, S3, Kafka, Twitter,..
•Data as Discretized Stream (DStream)
•Same programming model as for batches
•All Operations of the GenericRDD & SQL & MLLib
•Stateful Operations & Sliding Windows
Stream Processing With Spark Streaming_
33
import	org.apache.spark.streaming._	
val	ssc	=	new	StreamingContext(sc,Seconds(1))	
val	stream	=	ssc.socketTextStream("127.0.0.1",9999)	
stream.map(x	=>	1).reduce(_	+	_).print()	
ssc.start()	
//	await	manual	termination	or	error	
ssc.awaitTermination()	
//	manual	termination	
ssc.stop()	
Spark Streaming - Example_
34
Spark MLLib
35
•Fully integrated in Spark
• Scalable
• Scala, Java & Python APIs
• Use with Spark Streaming & Spark SQL
•Packages various algorithms for machine learning
•Includes
• Clustering
• Classification
• Prediction
• Collaborative Filtering
•Still under development
• performance, algorithms
Spark MLLib_
36
MLLib Example - Collaborative Filtering_
37
//	Load	and	parse	the	data	(userid,itemid,rating)	
val	data	=	sc.textFile("data/mllib/als/test.data")	
val	ratings	=	data.map(_.split(',')	match		
{		
case	Array(user,	item,	rate)	=>	Rating(user.toInt,	
item.toInt,	rate.toDouble)		
})	
//	Build	the	recommendation	model	using	ALS	
val	rank	=	10	
val	numIterations	=	20	
val	model	=	ALS.train(ratings,	rank,	numIterations,	0.01)
MLLib Example - Collaborative Filtering using ALS_
38
//	Evaluate	the	model	on	rating	data	
val	usersProducts	=	ratings.map	{		
case	Rating(user,	product,	rate)	=>	(user,	product)	}	
val	predictions	=	model.predict(usersProducts).map	{	
	case	Rating(user,	product,	rate)	=>	((user,	product),	rate)	
}	
val	ratesAndPredictions	=	ratings.map	{		
case	Rating(user,	product,	rate)	=>((user,	product),	rate)}	
.join(predictions)	
val	MSE	=	ratesAndPredictions.map	{		
case	((user,	product),	(r1,	r2))	=>	val	err	=	(r1	-	r2);	
err	*	err	}.mean()	
println("Mean	Squared	Error	=	"	+	MSE)
MLLib Example - Collaborative Filtering using ALS_
39
Use Cases
40
•In particular for huge amounts of external data
•Support for CSV, TSV, XML, JSON und other
Use Cases for Spark and Cassandra_
41
Data Loading
case	class	User	(id:	java.util.UUID,	name:	String)	
val	users	=	sc.textFile("users.csv")
.repartition(2*sc.defaultParallelism)
.map(line	=>	line.split(",")	match	{	case	Array(id,name)	=>	
User(java.util.UUID.fromString(id),	name)})	
users.saveToCassandra("keyspace","users")
Validate consistency in a Cassandra database
•syntactic
• Uniqueness (only relevant for columns not in the PK)
• Referential integrity
• Integrity of the duplicates
•semantic
• Business- or Application constraints
• e.g.: At least one genre per movies, a maximum of 10 tags per blog
post
Use Cases for Spark and Cassandra_
42
Validation & Normalization
•Modelling, Mining, Transforming, ....
•Use Cases
• Recommendation
• Fraud Detection
• Link Analysis (Social Networks, Web)
• Advertising
• Data Stream Analytics ( Spark Streaming)
• Machine Learning ( Spark ML)
Use Cases for Spark and Cassandra_
43
Analyses (Joins, Transformations,..)
•Changes on existing tables
• New table required when changing primary key
• Otherwise changes could be performed in-place
•Creating new tables
• data derived from existing tables
• Support new queries
•Use the CassandraConnectors in Spark
Use Cases for Spark and Cassandra_
44
Schema Migration
Questions?
Matthias Niehoff,
IT-Consultant
90
codecentric AG
Zeppelinstraße 2
76185 Karlsruhe, Germany
mobil: +49 (0) 172.1702676
matthias.niehoff@codecentric.de
www.codecentric.de
blog.codecentric.de
matthiasniehoff

More Related Content

What's hot

Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
Russell Spitzer
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
Rahul Kumar
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
Evan Chan
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
prajods
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
StampedeCon
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Don Drake
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 

What's hot (20)

Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 

Similar to Analytics with Cassandra & Spark

Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
Matthias Niehoff
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
Kyle Burke
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
Spark Summit
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
Christopher Batey
 
실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재
hkyoon2
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
Taewook Eom
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
nickmbailey
 
Cassandra and Spark SQL
Cassandra and Spark SQLCassandra and Spark SQL
Cassandra and Spark SQL
Russell Spitzer
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
Bryan Yang
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Databricks
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 

Similar to Analytics with Cassandra & Spark (20)

Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재실시간 Streaming using Spark and Kafka 강의교재
실시간 Streaming using Spark and Kafka 강의교재
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Presentation
PresentationPresentation
Presentation
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Cassandra and Spark SQL
Cassandra and Spark SQLCassandra and Spark SQL
Cassandra and Spark SQL
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 

Recently uploaded

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 

Recently uploaded (20)

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 

Analytics with Cassandra & Spark