• Save
LivePerson Case Study: Real Time Data Streaming using Storm & Kafka
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

LivePerson Case Study: Real Time Data Streaming using Storm & Kafka

  • 746 views
Uploaded on

Ran Silberman, Big Data, Technical Leader at LivePerson, participated in "Fullstack Developers Israel" meetup: "Never-Ending Data Streams" ...

Ran Silberman, Big Data, Technical Leader at LivePerson, participated in "Fullstack Developers Israel" meetup: "Never-Ending Data Streams" (http://www.meetup.com/full-stack-developer-il/events/166864612/).

In LivePerson we process great amounts of data.
The data is stored in Hadoop and can be used for batch processing and querying.

LivePerson use Kafka and Storm to complete a Big-Data solution for Real-time processing in addition to batch processing.

In his lecture, Ran provides and overview of the implementation of real-time data streaming at LivePerson and provides some tips and tricks from his experience, covering:

1) High Availability

2) Data consistency

3) Data format and schema enforcement

4) Auditing data integrity

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
746
On Slideshare
735
From Embeds
11
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 11

http://www.slideee.com 11

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
  • 2. About me ● Technical Leader of Data Platform in LivePerson ● Bird watcher and amateur bird photographer
  • 3. Agenda ● Why we chose Kafka + Storm ● How implementation was done ● Measures of success ● Two examples of use ● Tips from our experience
  • 4. Data in LivePerson Visitor in Site Chat Window Agent console LivePerson SaaS Server LoginMonitor Rules, Intelligence, Decision Chat Chat Invite DATA DATA DATA BIG DATA
  • 5. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers ETL Sessionize Modeling Schema View Real-Time data Historical data
  • 6. Why Kafka + Storm? ● Need to scale out and plan for future scale ○ Limit for scale should not be technology ○ Let the limit be cost of (commodity) hardware ● What Data platforms can be implemented quickly? ○ Open source - fast evolving and community ○ Micro-services - do only what you ought to do! ● Are there risks in this choice? ○ Yes! technology is not mature enough ○ But, there is no other mature technology that can address our needs!
  • 7. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers Customers ETL Sessionize Modeling Schema View
  • 8. 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica)HDFS Hadoop MR Job transfers data to BI DWH Customers BI DWH (Oracle)
  • 9. 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Customers ETL Modeling Schema View Sessionize
  • 10. 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Customers
  • 11. 4. Add Real-time BI 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Analytics DB
  • 12. Architecture Real-time servers Kafka Storm Cassandra/ CouchBase Real Time Processing Flow rate into Kafka: 33 MB/Sec Flow rate from Kafka: 20 MB/Sec Total daily data in Kafka: 17 Billion events Some Numbers: Cyber Monday 2013 Dashboards 4 topologies reading all events
  • 13. Two use cases 1. Visitor list 2. Agent State
  • 14. 1st Strom Use Case: “Visitors List” Use case: ● Show list of visitors in the “Agent Console” ● Collect data about visitor in real time ● Visitor stickiness in streaming process
  • 15. Visitors List Topology
  • 16. Selected Analytics DB - Couchbase 1st Strom Use Case: “Visitors List” ● Document Store - for complex documents ● Searchable - possible to search by different attributes. ● High throughput - Read & Write
  • 17. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Write event to Visitor document emit emit Kafka events stream Add/ Update Couchbase “Visitor List” Topology: Analytics DB: Couchbase - Document store Parse Avro into tuple emit
  • 18. Visitors List - Storm considerations ● Complex calculations before sending to DB ○ Ignore delayed events ○ Reorder events before storing ● Document cached in memory ● Fields Grouping to bolt that writes to CouchBase ● High parallelism in bolt that writes to CouchBase
  • 19. Visitors List Topology
  • 20. 2nd Storm Use Case: “Agent State” Use case: ● Show Agent activity on “Agent Console” ● Count Agent statistics ● Display graphs
  • 21. Agent Status Topology
  • 22. Selected Analytics DB - Cassandra 2nd Storm Use Case: “Agent State” ● Wide Column Store DB ● Highly Available w/o Single point of failure ● High throughput ● Optimized for counters
  • 23. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Agent Status” Topology: Analytics DB: Cassandra - Document store Parse Avro into tuple emit Data visualization using Highcharts
  • 24. Agent Status - Storm considerations ● Counters stored by topology ● Calculations done after reading from DB ● Delayed events should not be ignored ● Order of events does not matter ● Using Highcharts for data visualization
  • 25. Challenges: ● High network traffic ● Writing to Kafka is faster than reading ● All topologies read all events ● How to avoid resource starvation in Storm
  • 26. Optimizations of Kafka ● Increase Kafka consuming rate by adding partitions ● Run on physical machines with RAID ● Set retention to the proper need ● Monitor data flow!
  • 27. Optimizations of Storm ● #of Kafka-Spouts = number of total partitions ● Set “Isolation mode” for important topologies ● Validate Network cards can carry network traffic ● Set Storm cluster on high CPU machines ● Monitor servers CPU & Memory (Graphite) ● Assess min. #Cores that topology needs ○ Use “top” -> “load” to find server load
  • 28. Demo ● Agent Console - https://z1.le.liveperson.net/ 71394613 / rans@liveperson.com ● My Site - http://birds-of-israel.weebly.com/
  • 29. Questions?
  • 30. Thank you! ran.silberman@gmail.com