SlideShare a Scribd company logo
Learning	Apache	Spark	–
Part	2	– Transformations	
and	Actions	on	RDDs
Presenter	Introduction
Tim	Spann,	Senior	Solutions	Architect,	airis.DATA
• ex-Pivotal	Senior	Field	Engineer
• DZONE	MVB	and	Zone	Leader
• ex-Startup	Senior	Engineer	/	Team	Lead
http://www.slideshare.net/bunkertor
http://sparkdeveloper.com/
http://www.twitter.com/PaasDev
airis.DATA
airis.DATA is	a	next	generation	system	integrator	that	specializes	in	rapidly	deployable	
machine	learning	and	graph	solutions.	
Our	core	competencies	involve	providing	modular,	scalable	Big	Data	products	that	can	be	
tailored	to	fit	use	cases	across	industry	verticals.	
We	offer	predictive	modeling	and	machine	learning	solutions	at	Petabyte	scale	utilizing	
the	most	advanced,	best-in-class	technologies	and	frameworks	including	Spark,	H20,	and	
Flink.	
Our	data	pipelining	solutions	can	be	deployed	in	batch,	real-time	or	near-real-time	
settings	to	fit	your	specific	business	use-case.
Agenda
• Hands-On:			Quick	Install	Zeppelin
• RDD	Transformations
• RDD	Actions
• Hands-On:
• RDD	transformations	and	actions	in	Scala	on	Spark	Standalone	local
Installing	Zeppelin	and	Spark	1.6
• Java	JDK	8,	Scala	2.10,	SBT	0.13,	Maven	3.,	Spark	1.6.0
• http://www.oracle.com/technetwork/java/javase/downloads/index.html
• http://www.scala-lang.org/download/2.10.6.html
• http://www.scala-lang.org/download/install.html
• http://www.scala-sbt.org/download.html
• http://apache.claz.org/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip
• http://spark.apache.org/downloads.html
• http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
• http://www.apache.org/dyn/closer.cgi/incubator/zeppelin/0.5.6-incubating/zeppelin-
0.5.6-incubating-bin-all.tgz
• For	Mac	(brew	install	sbt)
Installing	Zeppelin	and	Spark	1.6	no.2
export SPARK_MASTER_IP=127.0.0.1
export SPARK_LOCAL_IP=127.0.0.1
export SCALA_HOME={YOURDIR}/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin
For Windows, use SET instead of EXPORT and ; and not :.
Running	Zeppelin	and	Spark	1.6
https://zeppelin.incubator.apache.org/docs/0.5.6-incubating/install/install.html
https://github.com/hortonworks-gallery/zeppelin-notebooks
Download	the	Apache	Zeppelin	binary	(Mac	and	Linux)
zeppelin-0.5.6-incubating-bin-all
Unzip
Run
cd	zeppelin-0.5.6-incubating-bin-all	
./bin/zeppelin-daemon.sh	start
http://localhost:8080/
Resilient	Distributed	Datasets	(RDDs)
have	ACTIONS that	return	values	(output)
val textfile =	sc.textFile(”mydata.txt”)
textfile.count()
TRANSFORMATIONS	which	return	pointers	to	new	RDDs.
val lines =	textFile.filter(line	=>	line.contains(“Spark”))
Transformation Meaning
map(func) Return	a	new	distributed	dataset	formed	by	passing	each	element	of	the	source	through	a	
function func.
filter(func) Return	a	new	dataset	formed	by	selecting	those	elements	of	the	source	on	which funcreturns	true.
flatMap(func) Similar	to	map,	but	each	input	item	can	be	mapped	to	0	or	more	output	items	(so funcshould	return	a	
Seq	rather	than	a	single	item).
mapPartitions(func) Similar	to	map,	but	runs	separately	on	each	partition	(block)	of	the	RDD,	so func must	be	of	type	
Iterator<T>	=>	Iterator<U>	when	running	on	an	RDD	of	type	T.
mapPartitionsWithIndex(func) Similar	to	mapPartitions,	but	also	provides func with	an	integer	value	representing	the	index	of	the	
partition,	so func must	be	of	type	(Int,	Iterator<T>)	=>	Iterator<U>	when	running	on	an	RDD	of	type	T.
sample(withReplacement, fractio
n, seed)
Sample	a	fraction fraction of	the	data,	with	or	without	replacement,	using	a	given	random	number	
generator	seed.
union(otherDataset) Return	a	new	dataset	that	contains	the	union	of	the	elements	in	the	source	dataset	and	the	argument.
intersection(otherDataset) Return	a	new	RDD	that	contains	the	intersection	of	elements	in	the	source	dataset	and	the	argument.
distinct([numTasks])) Return	a	new	dataset	that	contains	the	distinct	elements	of	the	source	dataset.
groupByKey([numTasks]) When	called	on	a	dataset	of	(K,	V)	pairs,	returns	a	dataset	of	(K,	Iterable<V>)	pairs.
Note: If	you	are	grouping	in	order	to	perform	an	aggregation	(such	as	a	sum	or	average)	over	each	key,	
using reduceByKey or aggregateByKey will	yield	much	better	performance.
Note: By	default,	the	level	of	parallelism	in	the	output	depends	on	the	number	of	partitions	of	the	
parent	RDD.	You	can	pass	an	optional numTasks argument	to	set	a	different	number	of	tasks.
Transformations
Transformation Meaning
reduceByKey(func,	
[numTasks])
When	called	on	a	dataset	of	(K,	V)	pairs,	returns	a	dataset	of	(K,	V)	pairs	where	the	values	for	each	key	are	aggregated	
using	the	given	reduce	function func,	which	must	be	of	type	(V,V)	=>	V.	Like	in groupByKey,	the	number	of	reduce	tasks	
is	configurable	through	an	optional	second	argument.
aggregateByKey(ze
roValue)(seqOp, co
mbOp,	[numTasks])
When	called	on	a	dataset	of	(K,	V)	pairs,	returns	a	dataset	of	(K,	U)	pairs	where	the	values	for	each	key	are	aggregated	
using	the	given	combine	functions	and	a	neutral	"zero"	value.	Allows	an	aggregated	value	type	that	is	different	than	the	
input	value	type,	while	avoiding	unnecessary	allocations.	Like	in groupByKey,	the	number	of	reduce	tasks	is	configurable	
through	an	optional	second	argument.
sortByKey([ascendi
ng],	[numTasks])
When	called	on	a	dataset	of	(K,	V)	pairs	where	K	implements	Ordered,	returns	a	dataset	of	(K,	V)	pairs	sorted	by	keys	in	
ascending	or	descending	order,	as	specified	in	the	booleanascending argument.
join(otherDataset,	
[numTasks])
When	called	on	datasets	of	type	(K,	V)	and	(K,	W),	returns	a	dataset	of	(K,	(V,	W))	pairs	with	all	pairs	of	elements	for	
each	key.	Outer	joins	are	supported	through leftOuterJoin,rightOuterJoin,	and fullOuterJoin.
cogroup(otherData
set,	[numTasks])
When	called	on	datasets	of	type	(K,	V)	and	(K,	W),	returns	a	dataset	of	(K,	(Iterable<V>,	Iterable<W>))	tuples.	This	
operation	is	also	called groupWith.
Transformations
Transformation Meaning
cartesian(otherDataset) When	called	on	datasets	of	types	T	and	U,	returns	a	dataset	of	(T,	U)	pairs	(all	pairs	of	
elements).
pipe(command, [envVars]) Pipe	each	partition	of	the	RDD	through	a	shell	command,	e.g.	a	Perl	or	bash	script.	RDD	
elements	are	written	to	the	process's	stdin	and	lines	output	 to	its	stdout	are	returned	as	
an	RDD	of	strings.
coalesce(numPartitions) Decrease	the	number	of	partitions	in	the	RDD	to	numPartitions.	Useful	for	running	
operations	more	efficiently	after	filtering	down	a	large	dataset.
repartition(numPartitions) Reshuffle	the	data	in	the	RDD	randomly	to	create	either	more	or	fewer	partitions	and	
balance	it	across	them.	This	always	shuffles	all	data	over	the	network.
repartitionAndSortWithinPartitions(
partitioner)
Repartition	the	RDD	according	to	the	given	partitioner	and,	within	each	resulting	
partition,	sort	records	by	their	keys.	This	is	more	efficient	than	calling repartition and	
then	sorting	within	each	partition	because	it	can	push	the	sorting	down	into	the	shuffle	
machinery.
Transformations
MAP
logFile.map(parseLogLine)
Where	parseLogLine	is	a	Scala	function	that	takes	one	line	of	the	Apache	log	as	
a	String	and	parses	it	into	a	LogRecord	case	class.			For	each	line	in	the	file	RDD,	
we	call	the	Map	function	on	it,	the	final	result	is	a	new	RDD.
FILTER
filter(!_.clientIp.equals("Empty"))
Where	we	filter	out	”Empty”	lines	from	our	resulting	RDD.			This	filter	is	
operating	on	an	RDD	of	LogRecords
MAP FILTER
Transformations
FLATMAP
val	flatRDD	=	originalRDD.flatMap(_.split("	"))
Maps	to	0	or	more	items	returning	a	Scala	Seq(uence).
MAPPARTITIONSWITHINDEX
val	mapped	=			originalRDD.mapPartitionsWithIndex{																							
(index,	iterator)	=>	{		println("Index	->	"	+	index)																										
val	myList	=	iterator.toList																											
myList.map(x	=>	x	+	"	->	"	+	index).iterator																								
}																					
}
Run	a	map	on	each	partition	and	get	an	index.			Otherwise	same	as	MapPartitions.
FLATMAP MAPPARTITIONS+
Transformations
val	rddSpark =	sc.parallelize(List("SQL","Streaming","GraphX",	"MLLib",	"Bagel",	
"SparkR","Python","Scala","Java",	"Alluxio",	"Tungsten",	"Zeppelin"))
val	rddHadoop =	sc.parallelize(List("HDFS",	"YARN",	"TEZ",	"Hive",	"HBase",	"Pig",	"Atlas",	"Storm",	
"Accumulo",	"Ranger",	"Phoenix",	"MapReduce",	"Slider",	"Flume",	"Kafka",	"Oozie",	"Sqoop",	
"Falcon","Knox",	"Ambari",	"Zookeeper",	"Cloudbreak",	"SQL",	"Java",	"Scala",	"Python"))
UNION
rddHadoop.union(rddSpark).collect()
Do	a	set	UNION	on	source	dataset	and	argument
INTERSECTION
rddHadoop.intersection(rddSpark).collect()
Do	a	set	intersection	on	source	dataset	and	argument
UNION INTERSECTION
Transformations
DISTINCT
bigDataRDD.distinct().collect()
Get	distinct	elements	from	the	source	dataset
SAMPLE
bigDataRDD.sample(true,0.25	).collect()
res89:	Array[String]	=	Array(HDFS,	TEZ,	Pig,	Knox,	Python,	Python)
Sample	a	fraction	(0.25)	of	the	data	with	replacement	(true).
Sampling	without	replacement	requires	one	additional	pass	over	the	RDD	to	
guarantee	sample	size,	whereas	sampling	with	replacement	requires	two	
additional	passes.			With	replacement	is	slower.
DISTINCT SAMPLE
Transformations
GROUPBYKEY
val	groupByRDD	=	keyValueRDD.groupByKey()
For	Datasets	(K,V)	pairs,	not	often	used.			reduceByKey	is	preferred.
REDUCEBYKEY
val	kvRDD	=	sc.parallelize(Seq((1,"Bacon"),	(1,	"Hamburger"),	(1,"Cheeseburger")))
val	reducedByRDD	=	kvRDD.reduceByKey((a,	b)	=>	a.concat(b))
reducedByRDD:	org.apache.spark.rdd.RDD[(Int,	String)]	=	ShuffledRDD[66]	at	
reduceByKey	at	<console>:31	res136:	Array[(Int,	String)]	=	
Array((1,BaconHamburgerCheeseburger))
Reduce	by	function	(concat)	on	the	key.
GROUPBYKEY REDUCEBYKEY
Transformations
AGGREGATEBYKEY
val	namesRDD	=	sc.parallelize(List((1,	25),	(1,	27),	(3,	25),	(3,	27)))val	groupByRDD	
=	namesRDD.aggregateByKey(0)((k,v)	=>	v.toInt+k,	(v,k)	=>	k+v).collect()	
groupByRDD:	Array[(Int,	Int)]	=	Array((1,52),	(3,52))
For	Datasets	(K,V)	pairs,	returns	pairs	where	values	for	each	key	are	aggregated	
with	a	function	and	“zero”	value.
SORTBYKEY
val	sortByRDD	=	namesRDD.sortByKey(true).collect()
sortByRDD:	Array[(Int,	Int)]	=	Array((1,25),	(1,27),	(3,25),	(3,27))
Returns	a	dataset	of	pairs	sorted	by	keys	in	ascending	or	descending	order
AGGREGATEBYKEY SORTBYKEY
Transformations
JOIN
val	otherKeyValueRDD	=	sc.parallelize(Seq(("Bacon",	"Amazing"),	 ("Steak",	"Fine"),	("Lettuce",	"Sad")))
keyValueRDD.join(otherKeyValueRDD).collect()
res166:	Array[(String,	(String,	 String))]	=	Array((Bacon,(Awesome,Amazing)))
Returns	a	dataset	with	pairs	for	each	key.
LEFTOUTERJOIN
keyValueRDD.leftOuterJoin(otherKeyValueRDD).collect()
res170:	Array[(String,	(String,	Option[String]))]	=	Array((PorkRoll,(Great,None)),	
(Tofu,(Bogus,None)),	(Bacon,(Awesome,Some(Amazing))))
Returns	a	dataset	following	SQL	style	outer	joins.
JOIN LEFTOUTERJOIN RIGHTOUTERJOIN FULLOUTERJOIN
COGROUP
keyValueRDD.cogroup(otherKeyValueRDD).collect()
res178:	Array[(String,	(Iterable[String],	Iterable[String]))]	=	
Array((PorkRoll,(CompactBuffer(Great),CompactBuffer())),	(Steak,(CompactBuffer(),CompactBuffer(Fine))),	
(Tofu,(CompactBuffer(Bogus),CompactBuffer())),	(Lettuce,(CompactBuffer(),CompactBuffer(Sad))),	
(Bacon,(CompactBuffer(Awesome),CompactBuffer(Amazing))))
Also	known	as	”groupWith”.
CARTESIAN
keyValueRDD.cartesian(otherKeyValueRDD).collect()
res182:	Array[((String,	String),	(String,	String))]	=	Array(((Bacon,Awesome),(Bacon,Amazing)),	
((Bacon,Awesome),(Steak,Fine)),	((Bacon,Awesome),(Lettuce,Sad)),	
((PorkRoll,Great),(Bacon,Amazing)),	((PorkRoll,Great),(Steak,Fine)),	((PorkRoll,Great),(Lettuce,Sad)),	
((Tofu,Bogus),(Bacon,Amazing)),	((Tofu,Bogus),(Steak,Fine)),	((Tofu,Bogus),(Lettuce,Sad)))
Returns	dataset	of	all	pairs	of	elements.		Cartesian	Product.
COGROUP CARTESIAN
Transformations
PIPE
keyValueRDD.pipe("cut	-c2-4").collect()	
res213:	Array[String]	=	Array(Bac,	Por,	Tof)
Call	a	command	line	function.
COALESCE
keyValueRDD.coalesce(1).collect()	
Decrease	the	number	of	partitions.
PIPE COALESCE
Transformations
REPARTITION
keyValueRDD.repartition(2).collect()	
res241:	Array[(String,	String)]	=	Array((Bacon,Awesome),	 (PorkRoll,Great),	 (Tofu,Bogus))
Reshuffle	the	data	in	the	RDD	randomly	to	create	either	more	or	less	partitions	
and	balance	across	them.			Good	after	filtering	down	a	large	dataset.
REPARTITIONANDSORTWITHINPARTITIONS
keyValueRDD.repartitionAndSortWithinPartitions(YourPartioner).collect()
Repartition	using	a	customer	partitioner,	sort	records	by	their	keys.			Secondary	
sorting.
See:		http://codingjunkie.net/spark-secondary-sort/
REPARTITION REPARTITIONANDSORTWITHINPARTITIONS
Action Meaning
reduce(func) Aggregate	the	elements	of	the	dataset	using	a	function func (which	takes	two	arguments	and	
returns	one).	The	function	should	be	commutative	and	associative	so	that	it	can	be	computed	
correctly	in	parallel.
collect() Return	all	the	elements	of	the	dataset	as	an	array	at	the	driver	program.	This	is	usually	useful	
after	a	filter	or	other	operation	that	returns	a	sufficiently	small	subset	of	the	data.
count() Return	the	number	 of	elements	in	the	dataset.
first() Return	the	first	element	of	the	dataset	(similar	to	take(1)).
take(n) Return	an	array	with	the	first n elements	of	the	dataset.
takeSample(withReplaceme
nt,num,	[seed])
Return	an	array	with	a	random	sample	of num elements	of	the	dataset,	with	or	without	
replacement,	optionally	pre-specifying	 a	random	number	generator	seed.
takeOrdered(n, [ordering]) Return	the	first n elements	of	the	RDD	using	either	their	natural	order	or	a	custom	comparator.
Actions
Action Meaning
saveAsTextFile(path) Write	the	elements	of	the	dataset	as	a	text	file	(or	set	of	text	files)	in	a	given	directory	in	the	
local	filesystem,	HDFS	or	any	other	Hadoop-supported	 file	system.	Spark	will	call	toString	on	
each	element	to	convert	it	to	a	line	of	text	in	the	file.
saveAsSequenceFile(pa
th)
(Java	and	Scala)
Write	the	elements	of	the	dataset	as	a	Hadoop	SequenceFile	in	a	given	path	in	the	local	
filesystem,	HDFS	or	any	other	Hadoop-supported	 file	system.	This	is	available	on	RDDs	of	
key-value	pairs	that	implement	Hadoop's	Writable	interface.	In	Scala,	it	is	also	available	on	
types	that	are	implicitly	convertible	to	Writable	(Spark	includes	conversions	for	basic	types	
like	Int,	Double,	 String,	etc).
saveAsObjectFile(path)
(Java	and	Scala)
Write	the	elements	of	the	dataset	in	a	simple	format	using	Java	serialization,	which	can	then	
be	loaded	usingSparkContext.objectFile().
countByKey() Only	available	on	RDDs	of	type	(K,	V).	Returns	a	hashmap	of	(K,	Int)	pairs	with	the	count	of	
each	key.
foreach(func) Run	a	function func on	each	element	of	the	dataset.	This	is	usually	done	for	side	effects	such	
as	updating	anAccumulator or	interacting	with	external	storage	systems.
Note:	modifying	 variables	other	than	Accumulators	outside	of	the foreach() may	result	in	
undefined	 behavior.	See Understanding	closures for	more	details.
Actions
ACTIONS
originalRDD.collect()
originalRDD.collect().foreach(println)
originalRDD.count()
originalRDD.first()
originalRDD.take(2)
originalRDD.takeSample(true,5,7634184)
originalRDD.takeOrdered(5)
Take	Sample	takes	the	#	of	samples,	if	you	want	replacements	an	a	
random	number	generator	seed.
COLLECT COUNT FIRST TAKE TAKESAMPLE TAKEORDERED
ACTIONS
keyValueRDD.countByKey().foreach(println)
(PorkRoll,1)	(Tofu,1)	(Bacon,1)
keyValueRDD.saveAsTextFile("here")
keyValueRDD.saveAsSequenceFile("here2")
keyValueRDD.saveAsObjectFile("here3")
In	Zeppelin,	%sh	ls,	will	show	you	the	local	files.			And	you	can	see	files	created	for	
here,	here2,	here3.			You	can	cat	“here/part-0003”	to	see	the	content	of	the	file.			It	
created	in	directory	“here”.
COUNTBYKEY FOREACH SAVEAS…
Actions
ACTIONS
bigDataRDD.reduce((a,	b)	=>	a.concat(b))
res154:	String	=	
AmbariZookeeperCloudbreakSQLJavaScalaPythonFlumeKafkaOozieSqoopFalconKnoxAtlasStormAccumuloRanger
PhoenixMapReduceSliderHDFSYARNTEZHiveHBasePigSQLStreamingGraphXMLLibBagelSparkRPythonScalaJavaAll
uxioTungstenZeppelin
Aggregates	the	elements	of	the	dataset	using	a	function.			For	this	one,	we	concatenate	all	the	Big	
Data	Strings	into	one	long	String	appropriate	for	resumes.
REDUCEActions
Apache	Zeppelin
Apache	Zeppelin	Runs
Apache	Zeppelin
Running	a	Spark	Job
DRIVER	PROGRAM
SPARK	CONTEXT
WORKER	NODE
EXECUTOR
TASKTASK
WORKER	NODE
EXECUTOR
TASKTASK
Running	a	Spark	Job
Spark	Resources
http://www.slideshare.net/airisdata/parquet-and-avro
http://airisdata.com/scala-spark-resources-setup-learning/
https://dzone.com/articles/anatomy-of-a-scala-spark-program
https://dzone.com/articles/proper-sbt-setup-for-scala-210-and-spark-streaming
https://github.com/airisdata/sparkworkshop
https://github.com/airisdata/SparkTransformations
https://github.com/airisdata/avroparquet
http://www.slideshare.net/airisdata/apache-spark-overview-59903397
https://plugins.jetbrains.com/plugin/?id=1347
http://mund-consulting.com/Products/Sparklet.aspx

More Related Content

What's hot

SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
Rahul Kumar
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Robert "Chip" Senkbeil
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
Helena Edelson
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
Slim Baltagi
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
rhatr
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
Amazon Web Services
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
airisData
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Juan Pedro Moreno
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
Anirvan Chakraborty
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 

What's hot (20)

SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processing
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 

Viewers also liked

Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
Pivotal CF and Continuous Delivery
Pivotal CF and Continuous DeliveryPivotal CF and Continuous Delivery
Pivotal CF and Continuous Delivery
Timothy Spann
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
Timothy Spann
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
Timothy Spann
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
Timothy Spann
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms
Timothy Spann
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
Joe Percivall
 
Postgres Open 2014 - A Performance Characterization of Postgres on Different ...
Postgres Open 2014 - A Performance Characterization of Postgres on Different ...Postgres Open 2014 - A Performance Characterization of Postgres on Different ...
Postgres Open 2014 - A Performance Characterization of Postgres on Different ...
Faisal Akber
 
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
PostgresOpen 2013 A Comparison of PostgreSQL Encryption OptionsPostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
Faisal Akber
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
SingleStore
 
Streaming with Oracle Data Integration
Streaming with Oracle Data IntegrationStreaming with Oracle Data Integration
Streaming with Oracle Data Integration
Michael Rainey
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Taejun Kim
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
Luis Gonzalez
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics services
Vladimir Bychkov
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
Suneet Grover
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 

Viewers also liked (20)

Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Pivotal CF and Continuous Delivery
Pivotal CF and Continuous DeliveryPivotal CF and Continuous Delivery
Pivotal CF and Continuous Delivery
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Postgres Open 2014 - A Performance Characterization of Postgres on Different ...
Postgres Open 2014 - A Performance Characterization of Postgres on Different ...Postgres Open 2014 - A Performance Characterization of Postgres on Different ...
Postgres Open 2014 - A Performance Characterization of Postgres on Different ...
 
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
PostgresOpen 2013 A Comparison of PostgreSQL Encryption OptionsPostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
 
Streaming with Oracle Data Integration
Streaming with Oracle Data IntegrationStreaming with Oracle Data Integration
Streaming with Oracle Data Integration
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics services
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 

Similar to Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs

Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
sivakumar s
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
airisData
 
Simplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing HadoopSimplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing Hadoop
Precisely
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both Worlds
Ahmed Tayeh
 
Steve cummings cv_2019_c
Steve cummings cv_2019_cSteve cummings cv_2019_c
Steve cummings cv_2019_c
SteveCummings20
 
Sakthi Shenbagam - Data warehousing Consultant
Sakthi Shenbagam - Data warehousing ConsultantSakthi Shenbagam - Data warehousing Consultant
Sakthi Shenbagam - Data warehousing ConsultantSakthi Shenbagam
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
Hortonworks
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
Databricks
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
WeCloudData
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_ResumeChandan Das
 
VINU BABU KURIAN_RPG_AS400
VINU BABU KURIAN_RPG_AS400 VINU BABU KURIAN_RPG_AS400
VINU BABU KURIAN_RPG_AS400 maxrockedge
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
Dan Lynn
 
Amutha Sandra - 1 Page CV
Amutha Sandra - 1 Page CVAmutha Sandra - 1 Page CV
Amutha Sandra - 1 Page CVAmutha Sandra
 
Enterprise Application Migration
Enterprise Application MigrationEnterprise Application Migration
Enterprise Application Migration
VMware Tanzu
 
Accion Labs - Rackspace - How can cloud help you?
Accion Labs - Rackspace - How can cloud help you?Accion Labs - Rackspace - How can cloud help you?
Accion Labs - Rackspace - How can cloud help you?
Accion Labs, Inc.
 
Azure App Modernization
Azure App ModernizationAzure App Modernization
Azure App Modernization
Phi Huynh
 
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Principled Technologies
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
José Renato Pequeno
 

Similar to Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs (20)

Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
 
SivakumarS
SivakumarSSivakumarS
SivakumarS
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
Simplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing HadoopSimplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing Hadoop
 
Richard Clapp Mar 2015 short resume
Richard Clapp Mar 2015 short resumeRichard Clapp Mar 2015 short resume
Richard Clapp Mar 2015 short resume
 
Informatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both WorldsInformatica + Hadoop = Best of Both Worlds
Informatica + Hadoop = Best of Both Worlds
 
Steve cummings cv_2019_c
Steve cummings cv_2019_cSteve cummings cv_2019_c
Steve cummings cv_2019_c
 
Sakthi Shenbagam - Data warehousing Consultant
Sakthi Shenbagam - Data warehousing ConsultantSakthi Shenbagam - Data warehousing Consultant
Sakthi Shenbagam - Data warehousing Consultant
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_Resume
 
VINU BABU KURIAN_RPG_AS400
VINU BABU KURIAN_RPG_AS400 VINU BABU KURIAN_RPG_AS400
VINU BABU KURIAN_RPG_AS400
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
 
Amutha Sandra - 1 Page CV
Amutha Sandra - 1 Page CVAmutha Sandra - 1 Page CV
Amutha Sandra - 1 Page CV
 
Enterprise Application Migration
Enterprise Application MigrationEnterprise Application Migration
Enterprise Application Migration
 
Accion Labs - Rackspace - How can cloud help you?
Accion Labs - Rackspace - How can cloud help you?Accion Labs - Rackspace - How can cloud help you?
Accion Labs - Rackspace - How can cloud help you?
 
Azure App Modernization
Azure App ModernizationAzure App Modernization
Azure App Modernization
 
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 

More from Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Timothy Spann
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Timothy Spann
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
Timothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
Timothy Spann
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Timothy Spann
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Timothy Spann
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 

More from Timothy Spann (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 

Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs