Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Avro and Messaging at Scale in LivePerson


Published on

This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.

Published in: Technology
  • #Hadoop #Avro #Cassandro #Drill #Flume Tutorial (Videos and Books)at $7.95
    Are you sure you want to  Yes  No
    Your message goes here

Apache Avro and Messaging at Scale in LivePerson

  1. 1. Apache Avro in LivePerson Collecting and saving data is easy keeping it consistent is tough DevCon Tlv, June 2014 Amihay Zer-Kavod, Software Architect
  2. 2. Who am I? Amihay Zer-Kavod Software Architect Been in software Since 1989
  3. 3. LivePerson Echo System M/R
  4. 4. ● Consistent but decoupled communication between services, such as: o Monitoring, Interaction o Predictive, Sentiment o Reporting & Analysis o History Communication & Meaning event evento 事件 घटना ‫حدث‬ ‫ארוע‬ событие ● Consistent meaning over time o BigData Store (Hadoop) o Reporting
  5. 5. What can’t we use? Don’t use Direct APIs! They are completely wrong for this issue, since: • They produce too much coupling between services • APIs are synchronous by nature • Adds irrelevant complexity to the called service
  6. 6. So what is needed? The Message is the API! ● A unified event model (schema) for all reported events ● Management tools for the unified schema ● Tools for sending events over the wire ● Tools for reading/writing event in big data ● Backward and forward compatibility
  7. 7. The Event model From generic to specific structure with: • Common header - all common data to all events • Logical Entities - common header to all logical entities (such as Visitor) • Dynamic Specific headers • Specific Event body
  8. 8. Apache Avro to the rescue ● Avro - a schema based serialization/deserialization framework ● Avro idl - schema definition language ● Avro file - Hadoop integration ● Avro schema resolution ● Apache Avro created by Doug Cutting
  9. 9. Avro JSON schema sample { "type": "record", "name": "Event", "namespace": "com.liveperson.example", "doc": "Example event", "fields":[{ "name": "version", "type": "string", "default": "1" }, { "name": "id", "type": "string", "default": "Unknown"}, {"name": "time","type": "long","default": -1}, {"name": "body","type": "string","default": "no body"}, {"name": "color","type": { "type": "enum", "name": "Color", "symbols": ["NO_COLOR", "BLUE", "BLACK", "WHITE", "PINK"] }, "default": "NO_COLOR" } ] }
  10. 10. Avro IDL - LivePerson Event /** Base for all LivePerson Events */ @namespace("") record LPEvent { /** Common Header of the event */ CommonHeader header = null; /** Logical entity details participating in this event - Visitor, Agent, etc... */ array<Participant> participants = null; /** Holding specific platform info as node name (machine) cluster Id etc... */ PlatformHeader platformSpecificHeader = null; /** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/ union {null, AuditingHeader } auditingHeader = null; /** The event body */ EventBody eventBody = null; }
  11. 11. Backward & Forward Compatibility Avro schema evolution ● Avro supports two schemes resolution ● Need to follow a set of rules: ● Every field must have a default value ● A field can be added (make sure to put a default value) ● Field types can not be changed (add a new field instead) ● enum symbols can be added but never removed
  12. 12. Is that enough? M/R Migdalor
  13. 13. How good does it work? ● Cyber Monday 2013 (one day) o More than 320,000 events per second o 7 Storm topologies consuming the events seconds from real time o 2TB of data saved to Hadoop ● 2014 preparation: o x2 number of events per second to ~640,000
  14. 14. So how did we do it? 1. Use an event driven system, don’t use direct APIs 2. Create a unified schema for all events 3. Use Avro to implement the schema 4. Add some supporting infrastructure
  15. 15. ???? Questions event evento 事件 घटना ‫حدث‬ ‫ארוע‬ событие
  16. 16. Amihay Zer-Kavod You can contact me at: LivePerson is hiring!
  17. 17. Thank You