Real Time Data Streaming using Kafka & Storm
Upcoming SlideShare
Loading in...5
×
 

Real Time Data Streaming using Kafka & Storm

on

  • 2,471 views

This presentation describes 3 real use case of Real-Time Data Streaming and how they were implemented in LivePerson using Kafka and Storm

This presentation describes 3 real use case of Real-Time Data Streaming and how they were implemented in LivePerson using Kafka and Storm

Statistics

Views

Total Views
2,471
Views on SlideShare
2,251
Embed Views
220

Actions

Likes
9
Downloads
92
Comments
0

3 Embeds 220

http://ransilberman.wordpress.com 172
http://www.slideee.com 46
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Real Time Data Streaming using Kafka & Storm Real Time Data Streaming using Kafka & Storm Presentation Transcript

  • DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
  • About me ● Technical Leader of Data Platform in LivePerson ● Bird watcher and amateur bird photographer Pharaoh Eagle-Owl / Bubo ascalaphus This is what the people from previous slide were looking at… Amir Silberman
  • Agenda ● Why we chose Kafka + Storm ● How implementation was done ● Measures of success ● Two examples of use ● Tips from our experience
  • Data in LivePerson Visitor in Site Chat Window Agent console LivePerson SaaS Server LoginMonitor Rules, Intelligence, Decision Chat Chat Invite DATA DATA DATA BIG DATA
  • Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers ETL Sessionize Modeling Schema View Real-Time data Historical data
  • Why Kafka + Storm? ● Need to scale out and plan for future scale ○ Limit for scale should not be technology ○ Let the limit be cost of (commodity) hardware ● What Data platforms can be implemented quickly? ○ Open source - fast evolving and community ○ Micro-services - do only what you ought to do! ● Are there risks in this choice? ○ Yes! technology is not mature enough ○ But, there is no other mature technology that can address our needs!
  • Long-eared Owl / Asio otus Amir Silberman
  • Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers Customers ETL Sessionize Modeling Schema View
  • 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica)HDFS Hadoop MR Job transfers data to BI DWH Customers
  • 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Customers
  • 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Customers
  • 4. Add Real-time BI 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Analytics DB
  • Architecture Real-time servers Kafka Storm Cassandra/ CouchBase Real Time Processing Flow rate into Kafka: 33 MB/Sec Flow rate from Kafka: 20 MB/Sec Total daily data in Kafka: 17 Billion events Some Numbers: Cyber Monday 2013 Dashboards 4 topologies reading all events
  • Eurasian Wryneck / Jynx torquilla Amir Silberman
  • Two use cases 1. Visitor list 2. Agent State
  • 1st Strom Use Case: “Visitors List” Use case: ● Show list of visitors in the “Agent Console” ● Collect data about visitor in real time ● Visitor stickiness in streaming process
  • Visitors List Topology
  • Selected Analytics DB - Couchbase 1st Strom Use Case: “Visitors List” ● Document Store - for complex documents ● Searchable - possible to search by different attributes. ● High throughput - Read & Write
  • First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Write event to Visitor document emit emit Kafka events stream Add/ Update Couchbase “Visitor List” Topology: Analytics DB: Couchbase - Document store Parse Avro into tuple emit
  • Visitors List - Storm considerations ● Complex calculations before sending to DB ○ Ignore delayed events ○ Reorder events before storing ● Document cached in memory ● Fields Grouping to bolt that writes to CouchBase ● High parallelism in bolt that writes to CouchBase
  • Visitors List Topology
  • European Roller / Coracias garrulus Amir Silberman
  • 2nd Storm Use Case: “Agent State” Use case: ● Show Agent activity on “Agent Console” ● Count Agent statistics ● Display graphs
  • Agent Status Topology
  • Selected Analytics DB - Cassandra 2nd Storm Use Case: “Agent State” ● Wide Column Store DB ● Highly Available w/o Single point of failure ● High throughput ● Optimized for counters
  • First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Agent Status” Topology: Analytics DB: Cassandra - Document store Parse Avro into tuple emit Data visualization using Highcharts
  • Agent Status - Storm considerations ● Counters stored by topology ● Calculations done after reading from DB ● Delayed events should not be ignored ● Order of events does not matter ● Using Highcharts for data visualization
  • Spur-winged Lapwing / Vanellus spinosus Amir Silberman
  • 3rd Storm Use Case: Data Auditing Use case: ● Needs to be able to tell whether events arrived ○ Where there any missing events? ○ Where there any duplicated events? ○ How long did it take for events to arrive? ● Data not important - only count of events
  • 3rd Storm Use Case: Data Auditing Realtime server Kafka Topics Auditing Topic Storm Sync topology Audit-loader topology MySql Hadoop HDFS audit job kafka 1 3 4 2 Auditor
  • First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Sync Audit” Topology: Sync messages between two topics Parse Avro into tuple emit Kafka Audit topic
  • First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka Audit topic Add “Load Audit” Topology: Analytics DB: MySql - RDBMS Parse Avro into tuple emit Auditing Report
  • “Load Audit” Topology: ● Stores statistics of events count ● SQL type DB ● Used for Auditing and other statistics ● Requires metadata in events header
  • Challenges: ● High network traffic ● Writing to Kafka is faster than reading ● All topologies read all events ● How to avoid resource starvation in Storm Subalpine Warbler / Sylvia cantillans Amir Silberman
  • Optimizations of Kafka ● Increase Kafka consuming rate by adding partitions ● Run on physical machines with RAID ● Set retention to the proper need ● Monitor data flow!
  • Optimizations of Storm ● #of Kafka-Spouts = number of total partitions ● Set “Isolation mode” for important topologies ● Validate Network cards can carry network traffic ● Set Storm cluster on high CPU machines ● Monitor servers CPU & Memory (Graphite) ● Assess min. #Cores that topology needs ○ Use “top” -> “load” to find server load
  • Demo ● Agent Console - https://z1.le.liveperson.net/ 71394613 / rans@liveperson.com ● My Site - http://birds-of-israel.weebly.com/
  • Questions? Little Owl / Athene noctua Amir Silberman
  • Thank you! Ruff / Philomachus pugnax Amir Silberman