Your SlideShare is downloading. ×
0
DATA
LivePerson Case Study:
Real Time Data Streaming
March 20th 2014
Ran Silberman
About me
● Technical Leader of Data Platform in LivePerson
● Bird watcher and amateur bird photographer
Pharaoh Eagle-Owl ...
Agenda
● Why we chose Kafka + Storm
● How implementation was done
● Measures of success
● Two examples of use
● Tips from ...
Data in LivePerson
Visitor in Site
Chat Window
Agent console
LivePerson SaaS Server
LoginMonitor
Rules,
Intelligence,
Deci...
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
ETL
Sessionize
Modeling
Schema
View
Real-Time data
Histori...
Why Kafka + Storm?
● Need to scale out and plan for future scale
○ Limit for scale should not be technology
○ Let the limi...
Long-eared Owl / Asio otus
Amir Silberman
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
Customers
ETL
Sessionize
Modeling
Schema
View
1st phase - move to Hadoop
ETL
Sessionize
Modeling
Schema
View
RealTime
servers
BI DWH
(Vertica)HDFS
Hadoop
MR Job transfe...
2. move to Kafka
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1
Customers
3. Integrate with new producers
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topi...
4. Add Real-time BI
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-...
Architecture
Real-time
servers
Kafka
Storm
Cassandra/
CouchBase
Real Time Processing
Flow rate
into Kafka:
33 MB/Sec
Flow ...
Eurasian Wryneck / Jynx torquilla
Amir Silberman
Two use cases
1. Visitor list
2. Agent State
1st Strom Use Case: “Visitors List”
Use case:
● Show list of visitors in the “Agent Console”
● Collect data about visitor ...
Visitors List Topology
Selected Analytics DB - Couchbase
1st Strom Use Case: “Visitors List”
● Document Store - for complex documents
● Searchabl...
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Write event to
Visitor document
emi...
Visitors List - Storm considerations
● Complex calculations before sending to DB
○ Ignore delayed events
○ Reorder events ...
Visitors List Topology
European Roller / Coracias garrulus
Amir Silberman
2nd Storm Use Case: “Agent State”
Use case:
● Show Agent activity on “Agent Console”
● Count Agent statistics
● Display gr...
Agent Status Topology
Selected Analytics DB - Cassandra
2nd Storm Use Case: “Agent State”
● Wide Column Store DB
● Highly Available w/o Single p...
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka events ...
Agent Status - Storm considerations
● Counters stored by topology
● Calculations done after reading from DB
● Delayed even...
Spur-winged Lapwing / Vanellus spinosus
Amir Silberman
3rd Storm Use Case: Data Auditing
Use case:
● Needs to be able to tell whether events arrived
○ Where there any missing ev...
3rd Storm Use Case: Data Auditing
Realtime server
Kafka
Topics
Auditing
Topic
Storm Sync
topology
Audit-loader
topology
My...
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka events ...
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka Audit t...
“Load Audit” Topology:
● Stores statistics of events count
● SQL type DB
● Used for Auditing and other statistics
● Requir...
Challenges:
● High network traffic
● Writing to Kafka is faster than reading
● All topologies read all events
● How to avo...
Optimizations of Kafka
● Increase Kafka consuming rate by adding partitions
● Run on physical machines with RAID
● Set ret...
Optimizations of Storm
● #of Kafka-Spouts = number of total partitions
● Set “Isolation mode” for important topologies
● V...
Demo
● Agent Console - https://z1.le.liveperson.net/
71394613 / rans@liveperson.com
● My Site - http://birds-of-israel.wee...
Questions?
Little Owl / Athene noctua
Amir Silberman
Thank you!
Ruff / Philomachus pugnax
Amir Silberman
Upcoming SlideShare
Loading in...5
×

Real Time Data Streaming using Kafka & Storm

9,478

Published on

This presentation describes 3 real use case of Real-Time Data Streaming and how they were implemented in LivePerson using Kafka and Storm

Published in: Technology
0 Comments
36 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,478
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
467
Comments
0
Likes
36
Embeds 0
No embeds

No notes for slide

Transcript of "Real Time Data Streaming using Kafka & Storm"

  1. 1. DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
  2. 2. About me ● Technical Leader of Data Platform in LivePerson ● Bird watcher and amateur bird photographer Pharaoh Eagle-Owl / Bubo ascalaphus This is what the people from previous slide were looking at… Amir Silberman
  3. 3. Agenda ● Why we chose Kafka + Storm ● How implementation was done ● Measures of success ● Two examples of use ● Tips from our experience
  4. 4. Data in LivePerson Visitor in Site Chat Window Agent console LivePerson SaaS Server LoginMonitor Rules, Intelligence, Decision Chat Chat Invite DATA DATA DATA BIG DATA
  5. 5. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers ETL Sessionize Modeling Schema View Real-Time data Historical data
  6. 6. Why Kafka + Storm? ● Need to scale out and plan for future scale ○ Limit for scale should not be technology ○ Let the limit be cost of (commodity) hardware ● What Data platforms can be implemented quickly? ○ Open source - fast evolving and community ○ Micro-services - do only what you ought to do! ● Are there risks in this choice? ○ Yes! technology is not mature enough ○ But, there is no other mature technology that can address our needs!
  7. 7. Long-eared Owl / Asio otus Amir Silberman
  8. 8. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers Customers ETL Sessionize Modeling Schema View
  9. 9. 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica)HDFS Hadoop MR Job transfers data to BI DWH Customers
  10. 10. 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Customers
  11. 11. 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Customers
  12. 12. 4. Add Real-time BI 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Analytics DB
  13. 13. Architecture Real-time servers Kafka Storm Cassandra/ CouchBase Real Time Processing Flow rate into Kafka: 33 MB/Sec Flow rate from Kafka: 20 MB/Sec Total daily data in Kafka: 17 Billion events Some Numbers: Cyber Monday 2013 Dashboards 4 topologies reading all events
  14. 14. Eurasian Wryneck / Jynx torquilla Amir Silberman
  15. 15. Two use cases 1. Visitor list 2. Agent State
  16. 16. 1st Strom Use Case: “Visitors List” Use case: ● Show list of visitors in the “Agent Console” ● Collect data about visitor in real time ● Visitor stickiness in streaming process
  17. 17. Visitors List Topology
  18. 18. Selected Analytics DB - Couchbase 1st Strom Use Case: “Visitors List” ● Document Store - for complex documents ● Searchable - possible to search by different attributes. ● High throughput - Read & Write
  19. 19. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Write event to Visitor document emit emit Kafka events stream Add/ Update Couchbase “Visitor List” Topology: Analytics DB: Couchbase - Document store Parse Avro into tuple emit
  20. 20. Visitors List - Storm considerations ● Complex calculations before sending to DB ○ Ignore delayed events ○ Reorder events before storing ● Document cached in memory ● Fields Grouping to bolt that writes to CouchBase ● High parallelism in bolt that writes to CouchBase
  21. 21. Visitors List Topology
  22. 22. European Roller / Coracias garrulus Amir Silberman
  23. 23. 2nd Storm Use Case: “Agent State” Use case: ● Show Agent activity on “Agent Console” ● Count Agent statistics ● Display graphs
  24. 24. Agent Status Topology
  25. 25. Selected Analytics DB - Cassandra 2nd Storm Use Case: “Agent State” ● Wide Column Store DB ● Highly Available w/o Single point of failure ● High throughput ● Optimized for counters
  26. 26. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Agent Status” Topology: Analytics DB: Cassandra - Document store Parse Avro into tuple emit Data visualization using Highcharts
  27. 27. Agent Status - Storm considerations ● Counters stored by topology ● Calculations done after reading from DB ● Delayed events should not be ignored ● Order of events does not matter ● Using Highcharts for data visualization
  28. 28. Spur-winged Lapwing / Vanellus spinosus Amir Silberman
  29. 29. 3rd Storm Use Case: Data Auditing Use case: ● Needs to be able to tell whether events arrived ○ Where there any missing events? ○ Where there any duplicated events? ○ How long did it take for events to arrive? ● Data not important - only count of events
  30. 30. 3rd Storm Use Case: Data Auditing Realtime server Kafka Topics Auditing Topic Storm Sync topology Audit-loader topology MySql Hadoop HDFS audit job kafka 1 3 4 2 Auditor
  31. 31. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Sync Audit” Topology: Sync messages between two topics Parse Avro into tuple emit Kafka Audit topic
  32. 32. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka Audit topic Add “Load Audit” Topology: Analytics DB: MySql - RDBMS Parse Avro into tuple emit Auditing Report
  33. 33. “Load Audit” Topology: ● Stores statistics of events count ● SQL type DB ● Used for Auditing and other statistics ● Requires metadata in events header
  34. 34. Challenges: ● High network traffic ● Writing to Kafka is faster than reading ● All topologies read all events ● How to avoid resource starvation in Storm Subalpine Warbler / Sylvia cantillans Amir Silberman
  35. 35. Optimizations of Kafka ● Increase Kafka consuming rate by adding partitions ● Run on physical machines with RAID ● Set retention to the proper need ● Monitor data flow!
  36. 36. Optimizations of Storm ● #of Kafka-Spouts = number of total partitions ● Set “Isolation mode” for important topologies ● Validate Network cards can carry network traffic ● Set Storm cluster on high CPU machines ● Monitor servers CPU & Memory (Graphite) ● Assess min. #Cores that topology needs ○ Use “top” -> “load” to find server load
  37. 37. Demo ● Agent Console - https://z1.le.liveperson.net/ 71394613 / rans@liveperson.com ● My Site - http://birds-of-israel.weebly.com/
  38. 38. Questions? Little Owl / Athene noctua Amir Silberman
  39. 39. Thank you! Ruff / Philomachus pugnax Amir Silberman
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×