Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meet Up - Spark Stream Processing + Kafka

6,662 views

Published on

Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record fashion.

It treats data not as static tables or files, but as a continuous infinite stream of data integrated from both live and historical sources.
In these slides we'll be looking into Sprak Stream Processing with Kafka.

Published in: Software
  • Sex in your area is here: ♥♥♥ http://bit.ly/36cXjBY ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/36cXjBY ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Meet Up - Spark Stream Processing + Kafka

  1. 1. Satendra Kumar Sr. Software Consultant Knoldus Software LLP Stream Processing
  2. 2. Topics Covered ➢ What is Stream ➢ What is Stream processing ➢ The challenges of stream processing ➢ Overview Spark Streaming ➢ Receivers ➢ Custom receivers ➢ Transformations on Dstreams ➢ Failures ➢ Fault-tolerance Semantics ➢ Kafka Integration ➢ Performance Tuning
  3. 3. What is Stream A stream is a sequence of data elements made available over time and which can be accessed in sequential order. Eg. YouTube video buffering.
  4. 4. What is Stream processing Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record fashion. It treats data not as static tables or files, but as a continuous infinite stream of data integrated from both live and historical sources.
  5. 5. ➢ Partitioning & Scalability ➢ Semantics & Fault tolerance ➢ Unifying the streams ➢ Time ➢ Re-Processing The challenges of stream processing
  6. 6. Spark Streaming ➢ Provides a way to process the live data streams. ➢ Scalable, high-throughput, fault-tolerant. ➢ Built top of core Spark API. ➢ API is very similar to Spark core API. ➢ Supports many sources like Kafka, Flume, Kinesis or TCP sockets. ➢ Currently based on RDDs.
  7. 7. Spark Streaming
  8. 8. Spark Streaming
  9. 9. Spark Streaming
  10. 10. Spark Streaming
  11. 11. Discretized Streams ➢ It provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data; ➢ DStreams can be created either from input data streams from sources such as Kafka, Flume, and Kinesis, or by applying high- level operations on other Dstreams. ➢ DStream is represented as a sequence of RDDs.
  12. 12. High level overview
  13. 13. High level overview
  14. 14. High level overview
  15. 15. High level overview
  16. 16. High level overview
  17. 17. High level overview
  18. 18. High level overview
  19. 19. High level overview
  20. 20. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords: DStream[String] = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() }
  21. 21. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() } Streaming Context
  22. 22. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() } Streaming Context Batch Interval
  23. 23. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() } Streaming Context Batch Interval Receiver
  24. 24. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords: DStream[String] = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() } Streaming Context Batch Interval Receiver Transformations on DStreams
  25. 25. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() } Streaming Context Batch Interval Receiver Transformations on DStreams Output Operations on DStreams
  26. 26. Driver Program object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print() streamingContext.start() streamingContext.awaitTermination() } Streaming Context Batch Interval Receiver Transformations on DStreams Output Operations on DStreams Start the Streaming
  27. 27. Important Points ➢ Once a context has been started, no new streaming computations can be set up or added to it. ➢ Once a context has been stopped, it cannot be restarted. ➢ Only one StreamingContext can be active in a JVM at the same time. ➢ stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false. ➢ A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.
  28. 28. Spark Streaming Concept ➢ Spark streaming is based on micro-batch architecture. ➢ Spark streaming continuously receives live input data streams and divides the data into batches. ➢ New batches are created at regular time intervals called batch interval. ➢ Each batch have N numbers blocks. Where N = batch-interval / block-interval For eg. If batch interval = 1 second and block interval= 200ms(by default) then each batch have 5 blocks.
  29. 29. Transforming DStream
  30. 30. Transforming DStream
  31. 31. Transforming DStream
  32. 32. Transforming DStream
  33. 33. Transforming DStream ➢ DStream is represented by a continuous series of RDDs ➢ Each RDD in a DStream contains data from a certain interval ➢ Any operation applied on a DStream translates to operations on the underlying RDDs ➢ Processing time of a batch should less than or equal to batch interval.
  34. 34. Transformations on DStreams def map[U: ClassTag](mapFunc: T => U): DStream[U] def flatMap[U: ClassTag](flatMapFunc: T => TraversableOnce[U]): DStream[U] def filter(filterFunc: T => Boolean): DStream[T] def reduce(reduceFunc: (T, T) => T): DStream[T] def count(): DStream[Long] def repartition(numPartitions: Int): DStream[T] def countByValue(numPartitions: Int = ssc.sc.defaultParallelism): DStream[(T, Long)] def transform[U: ClassTag](transformFunc: RDD[T] => RDD[U]): DStream[U]
  35. 35. Transformations on PairDStream def groupByKey(): DStream[(K, Iterable[V])] def reduceByKey(reduceFunc: (V, V) => V, numPartitions: Int): DStream[(K, V)] def join[W: ClassTag](other: DStream[(K, W)]): DStream[(K, (V, W))] def updateStateByKey[S: ClassTag]( updateFunc: (Seq[V], Option[S]) => Option[S],partitioner: Partitioner): DStream[(K, S)] def cogroup[W: ClassTag]( other: DStream[(K, W)], numPartitions: Int): DStream[(K, (Iterable[V], Iterable[W]))] def mapValues[U: ClassTag](mapValuesFunc: V => U): DStream[(K, U)] def leftOuterJoin[W: ClassTag]( other: DStream[(K, W)],numPartitions: Int): DStream[(K, (V, Option[W]))] def rightOuterJoin[W: ClassTag]( other: DStream[(K, W)], numPartitions: Int): DStream[(K, (Option[V], W))]
  36. 36. updateStateByKey object StreamingApp extends App { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) streamingContext.checkpoint(".") val lines = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords: DStream[String] = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val updatedState: DStream[(String, Int)] = pairs.updateStateByKey[Int] { (newValues: Seq[Int], state: Option[Int]) => Some(newValues.sum +state.getOrElse(0)) } updatedState.print() streamingContext.start() streamingContext.awaitTermination() }
  37. 37. Window Operations Spark Streaming also provides windowed computations, which allow you to apply transformations over a sliding window of data. Window operation needs to specify two parameters: ● window length - The duration of the window. ● sliding interval - The interval at which the window operation is performed.
  38. 38. Window Operations def window(windowDuration: Duration): DStream[T] def window(windowDuration: Duration, slideDuration: Duration): DStream[T] def reduceByWindow(reduceFunc: (T, T) => T, windowDuration: Duration, slideDuration: Duration): DStream[T] def countByWindow(windowDuration: Duration, slideDuration: Duration): DStream[Long] def countByValueAndWindow(windowDuration: Duration, slideDuration: Duration,numPartitions: Int): DStream[(T, Long)] //pairDStream Operations def groupByKeyAndWindow(windowDuration: Duration): DStream[(K, Iterable[V])] def groupByKeyAndWindow(windowDuration: Duration, slideDuration: Duration): DStream[(K, Iterable[V])] def reduceByKeyAndWindow(reduceFunc: (V, V) => V,windowDuration: Duration): DStream[(K, V)] def reduceByKeyAndWindow(reduceFunc: (V, V) => V, windowDuration: Duration,slideDuration: Duration): DStream[(K, V)]
  39. 39. Window Operations pairs.window(Seconds(15), Seconds(10)) filteredWords.reduceByWindow((a, b) => a +", "+ b, Seconds(15), Seconds(10)) pairs.reduceByKeyAndWindow((a: Int, b: Int) => a + b, Seconds(15), Seconds(10))
  40. 40. Output Operations on DStreams def print(num: Int): Unit def saveAsObjectFiles(prefix: String, suffix: String = ""): Unit def saveAsTextFiles(prefix: String, suffix: String = ""): Unit def foreachRDD(foreachFunc: RDD[T] => Unit): Unit def saveAsHadoopFiles[F <: OutputFormat[K, V]](prefix: String,suffix: String): Unit
  41. 41. Receivers Spark Streaming have two kinds of receivers: 1) Reliable Receiver - A reliable receiver correctly sends acknowledgment to a reliable source when the data has been received and stored in Spark with replication. 2) Unreliable Receiver - An unreliable receiver does not send acknowledgment to a source.
  42. 42. Custom Receiver A custom receiver must extend this abstract Receiver class by implementing two abstract methods: def onStart(): Unit //Things to do to start receiving data def onStop(): Unit // Things to do to stop receiving data
  43. 43. Custom Receiver class CustomReceiver(path: String) extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) { def onStart() { new Thread("File Reader") { override def run() { receive() } }.start() } def onStop() {} private def receive() = try { println("Reading file " + path) val reader = new BufferedReader( new InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8)) var userInput = reader.readLine() while (!isStopped && Option(userInput).isDefined) { store(userInput) userInput = reader.readLine() } reader.close() println("Stopped receiving") restart("Trying to connect again") } catch { case ex: Exception => restart("Error reading file " + path, ex) } }
  44. 44. Custom Receiver object CustomReceiver extends App { val sparkConf = new SparkConf().setAppName("CustomReceiver") val ssc = new StreamingContext(sparkConf, Seconds(1)) val lines = ssc.receiverStream(new CustomReceiver(args(0))) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() }
  45. 45. Failure is everywhere
  46. 46. Fault-tolerance Semantics Streaming system provides zero data loss guarantees despite any kind of failure in the system. ➢ At least once- Each record will be processed one or more times. ➢ Exactly once- Each record will be processed exactly once - no data will be lost and no data will be processed multiple times
  47. 47. Kinds of Failure There are two kind of failure: ➢ Executor failure 1) Data received and replicated 2) Data received but not replicated ➢ Driver failure
  48. 48. Executor failure
  49. 49. Executor failure
  50. 50. Executor failure
  51. 51. Executor failure
  52. 52. Executor failure
  53. 53. Executor failure Data would be lost ?
  54. 54. Executor with WAL
  55. 55. Executor failure
  56. 56. Enable write ahead logs object Streaming2App extends App { val checkpointDirectory ="checkpointDir"//It should be fault-tolerant & reliable file system(e.g. HDFS, S3, etc.) val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) streamingContext.checkpoint(checkpointDirectory) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords: DStream[String] = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print(20) streamingContext.start() streamingContext.awaitTermination() } Enable write logs
  57. 57. Enable write ahead logs object Streaming2App extends App { val checkpointDirectory ="checkpointDir"//It should be fault-tolerant & reliable file system(e.g. HDFS, S3, etc.) val sparkConf = new SparkConf().setMaster("local[*]").setAppName("StreamingApp") sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true") val streamingContext = new StreamingContext(sparkConf, Seconds(5)) streamingContext.checkpoint(checkpointDirectory) val lines: ReceiverInputDStream[String] = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords: DStream[String] = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print(20) streamingContext.start() streamingContext.awaitTermination() } Enable write logs Enable checkpointing
  58. 58. Enable write ahead logs 1) For WAL first need to enable checkpointing - streamingContext.checkpoint(checkpointDirectory) 2) Enable WAL in spark configuration -sparkConf.set("spark.streaming.receiver.writeAheadLog.enable","true") 3) Receiver should be reliable - Acknowledge source only after data saved to WAL - Unacknowledged data will be replayed from source by restated receiver 4) Disable in-memory replication (Already replicated By HDFS) - Use StorageLevel.MEMORY_AND_DISK_SER for input DStream
  59. 59. Driver failure
  60. 60. Driver failure
  61. 61. Driver failure
  62. 62. Driver failure
  63. 63. Driver failure How to recover from this Failure ?
  64. 64. Driver with checkpointing Dstream Checkpointing : Periodically save the DAG of DStream to fault-tolerant storage.
  65. 65. Driver failure
  66. 66. Recover from Driver failure
  67. 67. Recover from Driver failure
  68. 68. Recover from Driver failure 1) Configure Automatic driver restart -All cluster managers support this 2) Set a checkpoint directory - Directory should be in fault-tolerant & reliable file system (e.g., HDFS, S3, etc.) - streamingContext.checkpoint(checkpointDirectory) 3) Driver should be restart using checkpointing
  69. 69. Configure Automatic driver restart Spark Standalone - use spark-submit with “cluster” mode and “- - supervise” YARN -use spark-submit with “cluster” mode Mesos -Marathon can restart applications or use “- - supervise” flag
  70. 70. Configure Checkpointing object RecoverableWordCount { //should a fault-tolerant,reliable file system(e.g.HDFS,S3, etc.) val checkpointDirectory = "checkpointDir" def createContext() = { val sparkConf = new SparkConf().setAppName("StreamingApp") val streamingContext = new StreamingContext(sparkConf, Seconds(1)) streamingContext.checkpoint(checkpointDirectory) val lines = streamingContext.socketTextStream("localhost", 9000) val words: DStream[String] = lines.flatMap(_.split(" ")) val filteredWords: DStream[String] = words.filter(!_.trim.isEmpty) val pairs: DStream[(String, Int)] = filteredWords.map(word => (word, 1)) val wordCounts: DStream[(String, Int)] = pairs.reduceByKey(_ + _) wordCounts.print(20) streamingContext } }
  71. 71. Driver should be restart using checkpointing object StreamingApp extends App { import RecoverableWordCount._ val streamingContext = StreamingContext.getOrCreate(checkpointDirectory, createContext _) //do other operations streamingContext.start() streamingContext.awaitTermination() }
  72. 72. Driver should be restart using checkpointing object StreamingApp extends App { import RecoverableWordCount._ val streamingContext = StreamingContext.getOrCreate(checkpointDirectory, createContext _) //do other operations streamingContext.start() streamingContext.awaitTermination() }
  73. 73. Checkpointing There are two types of data that are checkpointed. 1) Metadata checkpointing -Configuration -DStream operations -Incomplete batches 2) Data checkpointing - Saving of the generated RDDs to reliable storage. This is necessary in some stateful transformations that combine data across multiple batches.
  74. 74. Checkpointing Latency ➔ Checkpointing of RDDs incurs the cost of saving to reliable storage. The interval of checkpointing needs to be set carefully. dstream.checkpoint( Seconds( (batch interval)*10 ) ) ➔ A checkpoint interval of 5 - 10 sliding intervals of a DStream is a good setting to try.
  75. 75. Fault-tolerance Semantics
  76. 76. Fault-tolerance Semantics
  77. 77. Fault-tolerance Semantics
  78. 78. Fault-tolerance Semantics
  79. 79. Fault-tolerance Semantics
  80. 80. Fault-tolerance Semantics
  81. 81. Spark Streaming & Kafka Integration
  82. 82. Why Kafka ? ➢ Velocity & volume of streaming data ➢ Reprocessing of streaming ➢ Reliable receiver complexity ➢ Checkpoint complexity ➢ Upgrading Application Code
  83. 83. Kafka Integration There are two approaches to integrate Kafka with Spark Streaming: ➢ Receiver-based Approach ➢ Direct Approach
  84. 84. Receiver-based Approach https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html
  85. 85. Receiver-based Approach import org.apache.spark.SparkConf import org.apache.spark.streaming.kafka.KafkaUtils import org.apache.spark.streaming.{Seconds, StreamingContext} object ReceiverBasedStreaming extends App { val group = "streaming-test-group" val zkQuorum = "localhost:2181" val topics = Map("streaming_queue" -> 1) val sparkConf = new SparkConf().setAppName("ReceiverBasedStreamingApp") sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true") val ssc = new StreamingContext(sparkConf, Seconds(2)) val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topics) .map { case (key, message) => message } val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() }
  86. 86. Direct Approach https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html
  87. 87. Direct Approach import kafka.serializer.StringDecoder import org.apache.spark.SparkConf import org.apache.spark.streaming._ import org.apache.spark.streaming.dstream.InputDStream import org.apache.spark.streaming.kafka._ object KafkaDirectStreaming extends App { val brokers = "localhost:9092" val sparkConf = new SparkConf().setAppName("KafkaDirectStreaming") val ssc = new StreamingContext(sparkConf, Seconds(2)) ssc.checkpoint("checkpointDir") //offset recovery val topics = Set("streaming_queue") val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers) val messages: InputDStream[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics) val lines = messages.map { case (key, message) => message } val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() }
  88. 88. Direct Approach Direct Approach has the following advantages over the receiver-based approach: ➢ Simplified Parallelism ➢ Efficiency ➢ Exactly-once semantics
  89. 89. Performance Tuning For best performance of a Spark Streaming application we need to consider two things: ➢ Reducing the Batch Processing Times ➢ Setting the Right Batch Interval
  90. 90. Reducing the Batch Processing Times ➢ Level of Parallelism in Data Receiving ➢ Level of Parallelism in Data Processing ➢ Data Serialization -Input data -Persisted RDDs generated by Streaming Operations ➢ Task Launching Overheads -Running Spark in Standalone mode or coarse-grained Mesos mode leads to better task launch times.
  91. 91. Setting the Right Batch Interval ➢ Batch processing time should be less than the batch interval. ➢ Memory Tuning -Persistence Level of Dstreams -Clearing old data -CMS Garbage Collector
  92. 92. Code samples https://github.com/knoldus/spark-streaming-meetup https://github.com/knoldus/real-time-stream-processing-engine https://github.com/knoldus/kafka-tweet-producer
  93. 93. Questions & DStream[Answer]
  94. 94. References http://spark.apache.org/docs/latest/streaming-programming-guide.html http://spark.apache.org/docs/latest/configuration.html#spark-streaming http://spark.apache.org/docs/latest/streaming-kafka-integration.html http://spark.apache.org/docs/latest/tuning.html https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.htm
  95. 95. Thanks Presenters: @_satendrakumar Organizer: @knolspeak http://www.knoldus.com

×