Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stream Processing with Spark and Storm: Couchbase Connect 2015

1,831 views

Published on

The problem with analyzing large volumes of data is latency. First, it takes time to load data. Next, it takes time to analyze it. The results are always out of date because there is always more data. The solution is to perform a continuous analysis of data in motion, or stream processing. In this session, we’ll evaluate two popular, open source stream processors, Spark and Storm, and discuss how they can be integrated with Couchbase Server as a streaming input source or as destination for the output.

Published in: Technology
  • Be the first to comment

Stream Processing with Spark and Storm: Couchbase Connect 2015

  1. 1. STREAM PROCESSING WITH STORM AND SPARK STREAMING Shane Johnson, Couchbase
  2. 2. ©2015 Couchbase Inc. 2 Stream Processing with Storm and Spark Agenda  MapReduce  Directed Acyclic Graphs  Storm  Spark Streaming  Enter Couchbase Server  Customer Example
  3. 3. ©2015 Couchbase Inc. 3 Stream Processing with Storm and Spark MapReduce Log Message Log Message Log Message Map Warn Info Info Shuffle Warn Error Info Info Info - 2 Warn- 1 Reduce
  4. 4. ©2015 Couchbase Inc. 4 Stream Processing with Storm and Spark Directed Acyclic Graph Log Message Extract Level Count Info Count Warn Count Error Get Counts
  5. 5. ©2015 Couchbase Inc. 5 Stream Processing with Storm and Spark Log Message Log Message Log Message Batch versus Stream Log Message Log Message Log Message Analyze Log Message Log Message Log Message
  6. 6. ©2015 Couchbase Inc. 6 Stream Processing with Storm and Spark Batch versus Stream Analyze Log Message Log Message Log Message Log Message Log Message
  7. 7. ©2015 Couchbase Inc. 7 Stream Processing with Storm and Spark Storm  Apache Software Foundation  Open Sourced byTwitter  AnalyzeTweets  “Trends”  Distributed  Real-Time  Continuous
  8. 8. ©2015 Couchbase Inc. 8 Stream Processing with Storm and Spark Terminology  Tuple – An immutable set of key/value pairs  name=shane, company=couchbase  Stream – An unbounded sequence of tuples  person, person, person, person, person…
  9. 9. ©2015 Couchbase Inc. 9 Stream Processing with Storm and Spark Terminology  Spout – A source of data for the stream  Pulls data from somewhere (e.g. message queue)  Pushes tuples into a stream  Bolt – Processes a stream of tuples  Consume Multiple Streams, Produce Multiple Streams  Do something (e.g. filter, aggregate, persist)
  10. 10. ©2015 Couchbase Inc. 10 Stream Processing with Storm and Spark Topology A set of spouts and bolts Src Spout Bolt Bolt Bolt Dest
  11. 11. ©2015 Couchbase Inc. 11 Stream Processing with Storm and Spark Grouping A bolt / spout is executed as parallel tasks Bolt Task Task Task Which task processes the tuple?
  12. 12. ©2015 Couchbase Inc. 12 Stream Processing with Storm and Spark Grouping  Shuffle – Send tuples to random tasks  Field – Send tuples to tasks based on the value of a field  All tuples with the same field value are sent to the same task  All – Send tuples to all tasks  Global – Send tuples to the same task
  13. 13. ©2015 Couchbase Inc. 13 Stream Processing with Storm and Spark Streams & Events Topology Log Message Tuple Output Log Message Output
  14. 14. ©2015 Couchbase Inc. 14 Stream Processing with Storm and Spark Spark Streaming  Apache Software Foundation  Open Sourced by UC Berkeley AMPLab  Spark  Spark Core  Spark Streaming  Spark SQL  Distributed  Real-Time  Continuous
  15. 15. ©2015 Couchbase Inc. 15 Stream Processing with Storm and Spark Resilient Distributed Datasets (RDD) “an immutable, partitioned collection of elements that can be operated on in parallel”  Read-Only, Partitioned  Create RDDs byTransformingThem  Lineage – Rebuild RDD from Previous RDDs
  16. 16. ©2015 Couchbase Inc. 16 Stream Processing with Storm and Spark Terminology  Input DStream – Source of Input Data  Receiver - Pulls data from somewhere (e.g. message queue)  Creates a stream of really small RDDs  Discretized Stream (DStream) – Stream of RDDs  Transform RDDs  Do something (e.g. filter, aggregate, persist)  Streams Create Streams
  17. 17. ©2015 Couchbase Inc. 17 Stream Processing with Storm and Spark Streams & RDDs Micro-Batching Log Log Log Log Log RDD LogLog RDD Input Stream DStream
  18. 18. ©2015 Couchbase Inc. 18 Stream Processing with Storm and Spark Transformations  map, flatmap  filter  repartition, union  join  cogroup, transform  window
  19. 19. ©2015 Couchbase Inc. 19 Stream Processing with Storm and Spark Streams Create Streams RDD –T1RDD –T2RDD –T3 RDDX –T1RDDX –T2RDDX –T3 RDDY –T1RDDY –T2RDDY –T3 Filter Count
  20. 20. ©2015 Couchbase Inc. 20 Stream Processing with Storm and Spark Enter Couchbase Server Pipeline Stream Processor ? ? Source Dest
  21. 21. ©2015 Couchbase Inc. 21 Stream Processing with Storm and Spark Enter Couchbase Server Pipeline Stream ProcessorSource Dest Kafka Couchbase Server
  22. 22. ©2015 Couchbase Inc. 22 Stream Processing with Storm and Spark Enter Couchbase Server Pipeline Stream ProcessorSource Dest Couchbase Server Couchbase Server
  23. 23. Stream Processing at LivePerson
  24. 24. ©2015 Couchbase Inc. 24 Stream Processing with Storm and Spark LiveEngage Chat Personalized Messages Personalized Offers
  25. 25. ©2015 Couchbase Inc. 25 Stream Processing with Storm and Spark • Identify visitor behaviors and patterns • Predict likelihood to buy • Identify intent • Provide targeted, personalized content • Provide satisfaction and conversion metrics • Engage visitors when necessary • Showing hesitation or signs of abandonment
  26. 26. ©2015 Couchbase Inc. 26 Stream Processing with Storm and Spark RealTime Web Analytics
  27. 27. ©2015 Couchbase Inc. 27 Stream Processing with Storm and Spark 22+ M Interactions 2+ B Sessions 13+ TB Data Per Month
  28. 28. ©2015 Couchbase Inc. 28 Stream Processing with Storm and Spark Customer AgentClickstream / Chat Visitor Feed Ingest Process Access
  29. 29. ©2015 Couchbase Inc. 29 Stream Processing with Storm and Spark HADOOPSTORM COUCHBASE SERVER KAFKA
  30. 30. ©2015 Couchbase Inc. 30 Stream Processing with Storm and Spark PROCESS ACCESS STORE ANALYZE REPORT MONITOR CHAT BATCH REAL TIME DASHBOARD
  31. 31. ©2015 Couchbase Inc. 31
  32. 32. Thank you.

×