Apache Avro in LivePerson [Hebrew]

Apache Avro in LivePerson
Collecting and saving data is easy
keeping it consistent is tough
Sandwich club, Sep 2014
Amihay Zer-Kavod, Software Architect

Who am I?
Amihay Zer-Kavod
Software Architect
Been in software Since 1989

Communication & Meaning
● Consistent but decoupled communication
between services, such as:
o Monitoring, Interaction
o Predictive, Sentiment
o RT Reporting & Analysis
o Visitor History
event
evento
事件
घटना
حدث
ארוע
событие
● Consistent meaning over time
o BigData Store (Hadoop)
o Offline Reporting & Analysis

What shouldn’t we use?
Don’t use Direct APIs!
They are completely wrong for this subject:
• They produce too much coupling between services
• APIs are synchronous by nature
• Adds irrelevant complexity to the called service

What is needed?
The Message is the API!
● A unified event model (schema) for all reported events
● Management tools for the unified schema
● Tools for sending events over the wire
● Tools for reading/writing event in big data
● Backward and forward compatibility

The Event model
From generic to specific structure with:
• Common header - all common data to all events
• Logical Entities - common header to all logical entities
(such as Visitor)
• Dynamic Specific headers
• Specific Event body

Apache Avro to the rescue
● Avro - a schema based serialization/deserialization
framework
● Avro idl - schema definition language
● Avro file - Hadoop integration
● Avro schema resolution
● Apache Avro created by Doug Cutting

Avro 101 - Data Structures
● Rich data structures
○ Primitives
■ null, int, long, boolean, float, double, bytes, string
○ Records
○ Map (string, Schema)
○ Arrays (Schema)
○ Enums
○ Unions

Avro 101 - JSON Schema
{
"type": "record",
"name": "Event",
"namespace": "com.liveperson.example",
"doc": "Example event",
"fields":[
{ "name": "id", "type": "string", "default": "Unknown"},
{ "name": "time", "type": "long", "default": -1},
{ "name": "color", "type":
{ "type": "enum", "name": "Color",
"symbols": ["NO_COLOR", "BLUE", "BLACK", "WHITE", "PINK"]
},
"default": "NO_COLOR" }
]
}

Avro 101 - Avro IDL Schema
@namespace("com.liveperson.example")
enum Color { NO_COLOR, BLUE, BLACK, WHITE, PINK }
/**
Example event
*/
@namespace("com.liveperson.example")
record Event {
string id = “Unknown”;
long time = -1;
Color color = "NO_COLOR";
}

Avro 101 - Serialization
● JSON Serialization
● Binary serialization
○ int, long - variable length, Zig-zag encoding
○ float, double - 4,8 bytes respectively
○ string - long followed by UTF-8 bytes
○ map, array - unlimited size, use blocks
○ Unions - long index of the type

Avro 101 - Generic vs. Specific
● SpecificDatumReader/Writer <T>
○ Static types
○ Code Generation: Java, C, C++, C#, Python, Ruby...
● GenericDatumReader/Writer <GenericRecord>
○ Dynamic types & access

Avro 101 - Schema Resolution
● Writer schema must be always provided for decoding
● Reader can use its own schema
● Allows the reader and writer schema to evolve
independently

Avro vs...
Technologies Protobuf Thrift Avro
Created 2001 (2008) 2007 2009
Creator / Maintainer Google / Google Facebook / Apache
Doug cutting /
Apache
Schema evolution Field Tag Field Tag Schema
Static/Dynamic Yes/No Yes/No Yes/Yes
Hadoop support No No Yes
RPC No Yes Yes
Used by Google Facebook, Cassandra Hadoop, Liveperson
Lang support Good Great Good

Backward & Forward Compatibility
Avro schema evolution
● Avro supports resolution between two schemes
● Need to follow a set of rules:
● Every field must have a default value
● A field can be added (make sure to put a default value)
● Field types can not be changed (add a new field
instead)
● enum symbols can be added but never removed

Avro IDL - LivePerson Event
/** Base for all LivePerson Events
*/
@namespace("com.liveperson.global")
record LPEvent {
/** Common Header of the event */
CommonHeader header = null;
/** Logical entity details participating in this event - Visitor, Agent, etc... */
array<Participant> participants = null;
/** Holding specific platform info as node name (machine) cluster Id etc... */
PlatformHeader platformSpecificHeader = null;
/** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/
union {null, AuditingHeader } auditingHeader = null;
/** The event body */
EventBody eventBody = null;
}

Wait there is (much) more!
M/R
Migdalor

How good does it work?
● Cyber Monday 2013 (one day)
o More than 320,000 events per second
o 7 Storm topologies consuming the events seconds from
real time
o 2TB of data saved to Hadoop
● 2014 preparation:
o x2 number of events per second to ~640,000

So how did we do it?
1. Use an event driven system, don’t use direct APIs
2. Create a unified schema for all events
3. Use Avro to implement the schema
4. Add some supporting infrastructure

Questions
????
event
evento
事件
घटना
حدث
ארוע
событие

Amihay Zer-Kavod
You can contact me at:
amihayz@liveperson.com
LivePerson is hiring!

Apache Avro in LivePerson [Hebrew]

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Apache Avro in LivePerson [Hebrew]

Similar to Apache Avro in LivePerson [Hebrew] (20)

More from LivePerson

More from LivePerson (20)

Recently uploaded

Recently uploaded (20)

Apache Avro in LivePerson [Hebrew]