Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Avro in LivePerson [Hebrew]

1,529 views

Published on

If you are building a service oriented system and you want to build it for scale as well as flexibility. There are a few questions you need to make sure are asked and answered regarding the data interchange between services and offline persistency of services data. Questions as:
- How can I change a service API without breaking other services?
- How do I keep data from services consistent over time?

This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency.
It includes a dive into the Apache Avro technology and how we used it.
Also what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.

Published in: Technology
  • http://www.dbmanagement.info/Tutorials/Hadoop.htm #Hadoop #Avro #Cassandro #Drill #Flume Tutorial (Videos and Books)at $7.95
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache Avro in LivePerson [Hebrew]

  1. 1. Apache Avro in LivePerson Collecting and saving data is easy keeping it consistent is tough Sandwich club, Sep 2014 Amihay Zer-Kavod, Software Architect
  2. 2. Who am I? Amihay Zer-Kavod Software Architect Been in software Since 1989
  3. 3. LivePerson Echo System M/R
  4. 4. Communication & Meaning ● Consistent but decoupled communication between services, such as: o Monitoring, Interaction o Predictive, Sentiment o RT Reporting & Analysis o Visitor History event evento 事件 घटना حدث ארוע событие ● Consistent meaning over time o BigData Store (Hadoop) o Offline Reporting & Analysis
  5. 5. What shouldn’t we use? Don’t use Direct APIs! They are completely wrong for this subject: • They produce too much coupling between services • APIs are synchronous by nature • Adds irrelevant complexity to the called service
  6. 6. What is needed? The Message is the API! ● A unified event model (schema) for all reported events ● Management tools for the unified schema ● Tools for sending events over the wire ● Tools for reading/writing event in big data ● Backward and forward compatibility
  7. 7. The Event model From generic to specific structure with: • Common header - all common data to all events • Logical Entities - common header to all logical entities (such as Visitor) • Dynamic Specific headers • Specific Event body
  8. 8. Apache Avro to the rescue ● Avro - a schema based serialization/deserialization framework ● Avro idl - schema definition language ● Avro file - Hadoop integration ● Avro schema resolution ● Apache Avro created by Doug Cutting
  9. 9. Avro 101 - Data Structures ● Rich data structures ○ Primitives ■ null, int, long, boolean, float, double, bytes, string ○ Records ○ Map (string, Schema) ○ Arrays (Schema) ○ Enums ○ Unions
  10. 10. Avro 101 - JSON Schema { "type": "record", "name": "Event", "namespace": "com.liveperson.example", "doc": "Example event", "fields":[ { "name": "id", "type": "string", "default": "Unknown"}, { "name": "time", "type": "long", "default": -1}, { "name": "color", "type": { "type": "enum", "name": "Color", "symbols": ["NO_COLOR", "BLUE", "BLACK", "WHITE", "PINK"] }, "default": "NO_COLOR" } ] }
  11. 11. Avro 101 - Avro IDL Schema @namespace("com.liveperson.example") enum Color { NO_COLOR, BLUE, BLACK, WHITE, PINK } /** Example event */ @namespace("com.liveperson.example") record Event { string id = “Unknown”; long time = -1; Color color = "NO_COLOR"; }
  12. 12. Avro 101 - Serialization ● JSON Serialization ● Binary serialization ○ int, long - variable length, Zig-zag encoding ○ float, double - 4,8 bytes respectively ○ string - long followed by UTF-8 bytes ○ map, array - unlimited size, use blocks ○ Unions - long index of the type
  13. 13. Avro 101 - Generic vs. Specific ● SpecificDatumReader/Writer <T> ○ Static types ○ Code Generation: Java, C, C++, C#, Python, Ruby... ● GenericDatumReader/Writer <GenericRecord> ○ Dynamic types & access
  14. 14. Avro 101 - Schema Resolution ● Writer schema must be always provided for decoding ● Reader can use its own schema ● Allows the reader and writer schema to evolve independently
  15. 15. Avro vs... Technologies Protobuf Thrift Avro Created 2001 (2008) 2007 2009 Creator / Maintainer Google / Google Facebook / Apache Doug cutting / Apache Schema evolution Field Tag Field Tag Schema Static/Dynamic Yes/No Yes/No Yes/Yes Hadoop support No No Yes RPC No Yes Yes Used by Google Facebook, Cassandra Hadoop, Liveperson Lang support Good Great Good
  16. 16. Backward & Forward Compatibility Avro schema evolution ● Avro supports resolution between two schemes ● Need to follow a set of rules: ● Every field must have a default value ● A field can be added (make sure to put a default value) ● Field types can not be changed (add a new field instead) ● enum symbols can be added but never removed
  17. 17. Avro IDL - LivePerson Event /** Base for all LivePerson Events */ @namespace("com.liveperson.global") record LPEvent { /** Common Header of the event */ CommonHeader header = null; /** Logical entity details participating in this event - Visitor, Agent, etc... */ array<Participant> participants = null; /** Holding specific platform info as node name (machine) cluster Id etc... */ PlatformHeader platformSpecificHeader = null; /** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/ union {null, AuditingHeader } auditingHeader = null; /** The event body */ EventBody eventBody = null; }
  18. 18. Wait there is (much) more! M/R Migdalor
  19. 19. How good does it work? ● Cyber Monday 2013 (one day) o More than 320,000 events per second o 7 Storm topologies consuming the events seconds from real time o 2TB of data saved to Hadoop ● 2014 preparation: o x2 number of events per second to ~640,000
  20. 20. So how did we do it? 1. Use an event driven system, don’t use direct APIs 2. Create a unified schema for all events 3. Use Avro to implement the schema 4. Add some supporting infrastructure
  21. 21. Questions ???? event evento 事件 घटना حدث ארוע событие
  22. 22. Amihay Zer-Kavod You can contact me at: amihayz@liveperson.com LivePerson is hiring!
  23. 23. Thank You

×