Your SlideShare is downloading. ×
0
Spark Streaming
Much easier than Storm
Replaces Storm spouts/bolts with Akka Actors
Better API(make time part of API) and ...
Sbt setup

Create a separate sbt project; sbt run

Includes the jars and sets the class path
− Batch and Streaming,
http...
Maven setup

Run the demo using maven/eclipse

Easier, maven central to find jars/artifacts

Add the external libs usin...
Demo

Connect to twitter stream and process
− Test Twitter4j connection w/Java first. Print out a
twitter stream

Batch ...
Spark Streams

StreamingContext start scheduler
− JobScheduler.scala: starts JobGenerator and runs
them in a thread pool
...
Kafka Streaming Demo

KafkaUtils/Consumer connection

IOItec connection lib

Need to add more features/testing for faul...
Demo Output showing console
producer to Spark Consumer
Producer/Executor
Match the broker-id in the server conf file with
groupID in the consumer call
val kafkaInputs = (1 to 5)...
Producer
Use awaitTermination() to get infinite loop so you
can see what you enter into the producer; Start
w/1 executor
v...
Producer
Use awaitTermination() to get infinite loop so you
can see what you enter into the producer; Start
w/1 executor
v...
Upcoming SlideShare
Loading in...5
×

Spark Streaming Info

673

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
673
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Spark Streaming Info"

  1. 1. Spark Streaming Much easier than Storm Replaces Storm spouts/bolts with Akka Actors Better API(make time part of API) and integration Hadoop 2.3/Spark 0.9.1
  2. 2. Sbt setup  Create a separate sbt project; sbt run  Includes the jars and sets the class path − Batch and Streaming, http://spark.apache.org/docs/latest/quick-start.html − Create a project directory − Add dependencies; scalaized maven  libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"  scalaVersion:="2.10.3" Manage the sbt/scala versions locally
  3. 3. Maven setup  Run the demo using maven/eclipse  Easier, maven central to find jars/artifacts  Add the external libs using maven to local repo and mvn package in spark source distro  Eclipse: add Scala Nature, Maven project
  4. 4. Demo  Connect to twitter stream and process − Test Twitter4j connection w/Java first. Print out a twitter stream  Batch Mode: sc.stop(); RealTime Streaming stream.awaitTermination().  Dstream/scala lazy evaluation − Create a stream using #:: like the recursive List operator. (#iphone,1)#:(#andriod,3)#(#apple,10). Unlike a list head/tail behave differently. Head is a val.
  5. 5. Spark Streams  StreamingContext start scheduler − JobScheduler.scala: starts JobGenerator and runs them in a thread pool − JobGenerator.scala: Starts event actor, checkpoint writer, for each thread  Storage: − DStream appends to blockgenerator − BlockGenerator.scala: Spark BlockGenerator w/2 threads. On termination wait for blockpush thread to join.
  6. 6. Kafka Streaming Demo  KafkaUtils/Consumer connection  IOItec connection lib  Need to add more features/testing for faults  Read source how to fill out params  Start zookeeper, start a producer, define a topic, etc... Send data from the producer
  7. 7. Demo Output showing console producer to Spark Consumer
  8. 8. Producer/Executor Match the broker-id in the server conf file with groupID in the consumer call val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181 ", "1", Map("testtopic" -> 1))
  9. 9. Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()
  10. 10. Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×