Your SlideShare is downloading. ×
Apache Avro and Messaging at Scale in LivePerson
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Avro and Messaging at Scale in LivePerson


Published on

This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how …

This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Apache Avro in LivePerson Collecting and saving data is easy keeping it consistent is tough DevCon Tlv, June 2014 Amihay Zer-Kavod, Software Architect
  • 2. Who am I? Amihay Zer-Kavod Software Architect Been in software Since 1989
  • 3. LivePerson Echo System M/R
  • 4. ● Consistent but decoupled communication between services, such as: o Monitoring, Interaction o Predictive, Sentiment o Reporting & Analysis o History Communication & Meaning event evento 事件 घटना ‫حدث‬ ‫ארוע‬ событие ● Consistent meaning over time o BigData Store (Hadoop) o Reporting
  • 5. What can’t we use? Don’t use Direct APIs! They are completely wrong for this issue, since: • They produce too much coupling between services • APIs are synchronous by nature • Adds irrelevant complexity to the called service
  • 6. So what is needed? The Message is the API! ● A unified event model (schema) for all reported events ● Management tools for the unified schema ● Tools for sending events over the wire ● Tools for reading/writing event in big data ● Backward and forward compatibility
  • 7. The Event model From generic to specific structure with: • Common header - all common data to all events • Logical Entities - common header to all logical entities (such as Visitor) • Dynamic Specific headers • Specific Event body
  • 8. Apache Avro to the rescue ● Avro - a schema based serialization/deserialization framework ● Avro idl - schema definition language ● Avro file - Hadoop integration ● Avro schema resolution ● Apache Avro created by Doug Cutting
  • 9. Avro JSON schema sample { "type": "record", "name": "Event", "namespace": "com.liveperson.example", "doc": "Example event", "fields":[{ "name": "version", "type": "string", "default": "1" }, { "name": "id", "type": "string", "default": "Unknown"}, {"name": "time","type": "long","default": -1}, {"name": "body","type": "string","default": "no body"}, {"name": "color","type": { "type": "enum", "name": "Color", "symbols": ["NO_COLOR", "BLUE", "BLACK", "WHITE", "PINK"] }, "default": "NO_COLOR" } ] }
  • 10. Avro IDL - LivePerson Event /** Base for all LivePerson Events */ @namespace("") record LPEvent { /** Common Header of the event */ CommonHeader header = null; /** Logical entity details participating in this event - Visitor, Agent, etc... */ array<Participant> participants = null; /** Holding specific platform info as node name (machine) cluster Id etc... */ PlatformHeader platformSpecificHeader = null; /** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/ union {null, AuditingHeader } auditingHeader = null; /** The event body */ EventBody eventBody = null; }
  • 11. Backward & Forward Compatibility Avro schema evolution ● Avro supports two schemes resolution ● Need to follow a set of rules: ● Every field must have a default value ● A field can be added (make sure to put a default value) ● Field types can not be changed (add a new field instead) ● enum symbols can be added but never removed
  • 12. Is that enough? M/R Migdalor
  • 13. How good does it work? ● Cyber Monday 2013 (one day) o More than 320,000 events per second o 7 Storm topologies consuming the events seconds from real time o 2TB of data saved to Hadoop ● 2014 preparation: o x2 number of events per second to ~640,000
  • 14. So how did we do it? 1. Use an event driven system, don’t use direct APIs 2. Create a unified schema for all events 3. Use Avro to implement the schema 4. Add some supporting infrastructure
  • 15. ???? Questions event evento 事件 घटना ‫حدث‬ ‫ארוע‬ событие
  • 16. Amihay Zer-Kavod You can contact me at: LivePerson is hiring!
  • 17. Thank You