Spark Streaming Info

Spark Streaming
Much easier than Storm
Replaces Storm spouts/bolts with Akka Actors
Better API(make time part of API) and integration
Hadoop 2.3/Spark 0.9.1

Sbt setup

Create a separate sbt project; sbt run

Includes the jars and sets the class path
− Batch and Streaming,
http://spark.apache.org/docs/latest/quick-start.html
− Create a project directory
− Add dependencies; scalaized maven

libraryDependencies += "org.apache.hadoop" %
"hadoop-client" % "2.3.0"

scalaVersion:="2.10.3"
Manage the sbt/scala versions locally

Maven setup

Run the demo using maven/eclipse

Easier, maven central to find jars/artifacts

Add the external libs using maven to local repo
and mvn package in spark source distro

Eclipse: add Scala Nature, Maven project

Demo

Connect to twitter stream and process
− Test Twitter4j connection w/Java first. Print out a
twitter stream

Batch Mode: sc.stop(); RealTime Streaming
stream.awaitTermination().

Dstream/scala lazy evaluation
− Create a stream using #:: like the recursive List
operator. (#iphone,1)#:(#andriod,3)#(#apple,10).
Unlike a list head/tail behave differently. Head is a
val.

Spark Streams

StreamingContext start scheduler
− JobScheduler.scala: starts JobGenerator and runs
them in a thread pool
− JobGenerator.scala: Starts event actor, checkpoint
writer, for each thread

Storage:
− DStream appends to blockgenerator
− BlockGenerator.scala: Spark BlockGenerator w/2
threads. On termination wait for blockpush thread to
join.

Kafka Streaming Demo

KafkaUtils/Consumer connection

IOItec connection lib

Need to add more features/testing for faults

Read source how to fill out params

Start zookeeper, start a producer, define a
topic, etc...
Send data from the producer

Demo Output showing console
producer to Spark Consumer

Producer/Executor
Match the broker-id in the server conf file with
groupID in the consumer call
val kafkaInputs = (1 to 5).map { _ =>
KafkaUtils.createStream(stream,"localhost:2181
", "1", Map("testtopic" -> 1))

Producer
Use awaitTermination() to get infinite loop so you
can see what you enter into the producer; Start
w/1 executor
val stream = new StreamingContext("local[2]","TestObject", Seconds(1))
val kafkaMessages=
KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1))
//create 5 executors
val kafkaInputs = (1 to 5).map { _ =>
KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1))
kafkaMessages.print()
stream.start()
stream.awaitTermination()

Spark Streaming Info

More Related Content

What's hot

Viewers also liked

Similar to Spark Streaming Info

More from Doug Chang

Recently uploaded

Spark Streaming Info