DATA
LivePerson Case Study:
Real Time Data Streaming
March 20th 2014
Ran Silberman
About me
● Technical Leader of Data Platform in LivePerson
● Bird watcher and amateur bird photographer
Agenda
● Why we chose Kafka + Storm
● How implementation was done
● Measures of success
● Two examples of use
● Tips from ...
Data in LivePerson
Visitor in Site
Chat Window
Agent console
LivePerson SaaS Server
LoginMonitor
Rules,
Intelligence,
Deci...
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
ETL
Sessionize
Modeling
Schema
View
Real-Time data
Histori...
Why Kafka + Storm?
● Need to scale out and plan for future scale
○ Limit for scale should not be technology
○ Let the limi...
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
Customers
ETL
Sessionize
Modeling
Schema
View
1st phase - move to Hadoop
ETL
Sessionize
Modeling
Schema
View
RealTime
servers
BI DWH
(Vertica)HDFS
Hadoop
MR Job transfe...
2. move to Kafka
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1
Customers
E...
3. Integrate with new producers
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topi...
4. Add Real-time BI
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-...
Architecture
Real-time
servers
Kafka
Storm
Cassandra/
CouchBase
Real Time Processing
Flow rate
into Kafka:
33 MB/Sec
Flow ...
Two use cases
1. Visitor list
2. Agent State
1st Strom Use Case: “Visitors List”
Use case:
● Show list of visitors in the “Agent Console”
● Collect data about visitor ...
Visitors List Topology
Selected Analytics DB - Couchbase
1st Strom Use Case: “Visitors List”
● Document Store - for complex documents
● Searchabl...
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Write event to
Visitor document
emi...
Visitors List - Storm considerations
● Complex calculations before sending to DB
○ Ignore delayed events
○ Reorder events ...
Visitors List Topology
2nd Storm Use Case: “Agent State”
Use case:
● Show Agent activity on “Agent Console”
● Count Agent statistics
● Display gr...
Agent Status Topology
Selected Analytics DB - Cassandra
2nd Storm Use Case: “Agent State”
● Wide Column Store DB
● Highly Available w/o Single p...
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka events ...
Agent Status - Storm considerations
● Counters stored by topology
● Calculations done after reading from DB
● Delayed even...
Challenges:
● High network traffic
● Writing to Kafka is faster than reading
● All topologies read all events
● How to avo...
Optimizations of Kafka
● Increase Kafka consuming rate by adding partitions
● Run on physical machines with RAID
● Set ret...
Optimizations of Storm
● #of Kafka-Spouts = number of total partitions
● Set “Isolation mode” for important topologies
● V...
Demo
● Agent Console - https://z1.le.liveperson.net/
71394613 / rans@liveperson.com
● My Site - http://birds-of-israel.wee...
Questions?
Thank you!
ran.silberman@gmail.com
LivePerson Case Study: Real Time Data Streaming using Storm & Kafka
LivePerson Case Study: Real Time Data Streaming using Storm & Kafka
LivePerson Case Study: Real Time Data Streaming using Storm & Kafka
Upcoming SlideShare
Loading in...5
×

LivePerson Case Study: Real Time Data Streaming using Storm & Kafka

881

Published on

Ran Silberman, Big Data, Technical Leader at LivePerson, participated in "Fullstack Developers Israel" meetup: "Never-Ending Data Streams" (http://www.meetup.com/full-stack-developer-il/events/166864612/).

In LivePerson we process great amounts of data.
The data is stored in Hadoop and can be used for batch processing and querying.

LivePerson use Kafka and Storm to complete a Big-Data solution for Real-time processing in addition to batch processing.

In his lecture, Ran provides and overview of the implementation of real-time data streaming at LivePerson and provides some tips and tricks from his experience, covering:

1) High Availability

2) Data consistency

3) Data format and schema enforcement

4) Auditing data integrity

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
881
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "LivePerson Case Study: Real Time Data Streaming using Storm & Kafka"

  1. 1. DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
  2. 2. About me ● Technical Leader of Data Platform in LivePerson ● Bird watcher and amateur bird photographer
  3. 3. Agenda ● Why we chose Kafka + Storm ● How implementation was done ● Measures of success ● Two examples of use ● Tips from our experience
  4. 4. Data in LivePerson Visitor in Site Chat Window Agent console LivePerson SaaS Server LoginMonitor Rules, Intelligence, Decision Chat Chat Invite DATA DATA DATA BIG DATA
  5. 5. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers ETL Sessionize Modeling Schema View Real-Time data Historical data
  6. 6. Why Kafka + Storm? ● Need to scale out and plan for future scale ○ Limit for scale should not be technology ○ Let the limit be cost of (commodity) hardware ● What Data platforms can be implemented quickly? ○ Open source - fast evolving and community ○ Micro-services - do only what you ought to do! ● Are there risks in this choice? ○ Yes! technology is not mature enough ○ But, there is no other mature technology that can address our needs!
  7. 7. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers Customers ETL Sessionize Modeling Schema View
  8. 8. 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica)HDFS Hadoop MR Job transfers data to BI DWH Customers BI DWH (Oracle)
  9. 9. 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Customers ETL Modeling Schema View Sessionize
  10. 10. 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Customers
  11. 11. 4. Add Real-time BI 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Analytics DB
  12. 12. Architecture Real-time servers Kafka Storm Cassandra/ CouchBase Real Time Processing Flow rate into Kafka: 33 MB/Sec Flow rate from Kafka: 20 MB/Sec Total daily data in Kafka: 17 Billion events Some Numbers: Cyber Monday 2013 Dashboards 4 topologies reading all events
  13. 13. Two use cases 1. Visitor list 2. Agent State
  14. 14. 1st Strom Use Case: “Visitors List” Use case: ● Show list of visitors in the “Agent Console” ● Collect data about visitor in real time ● Visitor stickiness in streaming process
  15. 15. Visitors List Topology
  16. 16. Selected Analytics DB - Couchbase 1st Strom Use Case: “Visitors List” ● Document Store - for complex documents ● Searchable - possible to search by different attributes. ● High throughput - Read & Write
  17. 17. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Write event to Visitor document emit emit Kafka events stream Add/ Update Couchbase “Visitor List” Topology: Analytics DB: Couchbase - Document store Parse Avro into tuple emit
  18. 18. Visitors List - Storm considerations ● Complex calculations before sending to DB ○ Ignore delayed events ○ Reorder events before storing ● Document cached in memory ● Fields Grouping to bolt that writes to CouchBase ● High parallelism in bolt that writes to CouchBase
  19. 19. Visitors List Topology
  20. 20. 2nd Storm Use Case: “Agent State” Use case: ● Show Agent activity on “Agent Console” ● Count Agent statistics ● Display graphs
  21. 21. Agent Status Topology
  22. 22. Selected Analytics DB - Cassandra 2nd Storm Use Case: “Agent State” ● Wide Column Store DB ● Highly Available w/o Single point of failure ● High throughput ● Optimized for counters
  23. 23. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Agent Status” Topology: Analytics DB: Cassandra - Document store Parse Avro into tuple emit Data visualization using Highcharts
  24. 24. Agent Status - Storm considerations ● Counters stored by topology ● Calculations done after reading from DB ● Delayed events should not be ignored ● Order of events does not matter ● Using Highcharts for data visualization
  25. 25. Challenges: ● High network traffic ● Writing to Kafka is faster than reading ● All topologies read all events ● How to avoid resource starvation in Storm
  26. 26. Optimizations of Kafka ● Increase Kafka consuming rate by adding partitions ● Run on physical machines with RAID ● Set retention to the proper need ● Monitor data flow!
  27. 27. Optimizations of Storm ● #of Kafka-Spouts = number of total partitions ● Set “Isolation mode” for important topologies ● Validate Network cards can carry network traffic ● Set Storm cluster on high CPU machines ● Monitor servers CPU & Memory (Graphite) ● Assess min. #Cores that topology needs ○ Use “top” -> “load” to find server load
  28. 28. Demo ● Agent Console - https://z1.le.liveperson.net/ 71394613 / rans@liveperson.com ● My Site - http://birds-of-israel.weebly.com/
  29. 29. Questions?
  30. 30. Thank you! ran.silberman@gmail.com

×