SlideShare a Scribd company logo
Apache Avro in LivePerson 
Collecting and saving data is easy 
keeping it consistent is tough 
Sandwich club, Sep 2014 
Amihay Zer-Kavod, Software Architect
Who am I? 
Amihay Zer-Kavod 
Software Architect 
Been in software Since 1989
LivePerson Echo System 
M/R
Communication & Meaning 
● Consistent but decoupled communication 
between services, such as: 
o Monitoring, Interaction 
o Predictive, Sentiment 
o RT Reporting & Analysis 
o Visitor History 
event 
evento 
事件 
घटना 
حدث 
ארוע 
событие 
● Consistent meaning over time 
o BigData Store (Hadoop) 
o Offline Reporting & Analysis
What shouldn’t we use? 
Don’t use Direct APIs! 
They are completely wrong for this subject: 
• They produce too much coupling between services 
• APIs are synchronous by nature 
• Adds irrelevant complexity to the called service
What is needed? 
The Message is the API! 
● A unified event model (schema) for all reported events 
● Management tools for the unified schema 
● Tools for sending events over the wire 
● Tools for reading/writing event in big data 
● Backward and forward compatibility
The Event model 
From generic to specific structure with: 
• Common header - all common data to all events 
• Logical Entities - common header to all logical entities 
(such as Visitor) 
• Dynamic Specific headers 
• Specific Event body
Apache Avro to the rescue 
● Avro - a schema based serialization/deserialization 
framework 
● Avro idl - schema definition language 
● Avro file - Hadoop integration 
● Avro schema resolution 
● Apache Avro created by Doug Cutting
Avro 101 - Data Structures 
● Rich data structures 
○ Primitives 
■ null, int, long, boolean, float, double, bytes, string 
○ Records 
○ Map (string, Schema) 
○ Arrays (Schema) 
○ Enums 
○ Unions
Avro 101 - JSON Schema 
{ 
"type": "record", 
"name": "Event", 
"namespace": "com.liveperson.example", 
"doc": "Example event", 
"fields":[ 
{ "name": "id", "type": "string", "default": "Unknown"}, 
{ "name": "time", "type": "long", "default": -1}, 
{ "name": "color", "type": 
{ "type": "enum", "name": "Color", 
"symbols": ["NO_COLOR", "BLUE", "BLACK", "WHITE", "PINK"] 
}, 
"default": "NO_COLOR" } 
] 
}
Avro 101 - Avro IDL Schema 
@namespace("com.liveperson.example") 
enum Color { NO_COLOR, BLUE, BLACK, WHITE, PINK } 
/** 
Example event 
*/ 
@namespace("com.liveperson.example") 
record Event { 
string id = “Unknown”; 
long time = -1; 
Color color = "NO_COLOR"; 
}
Avro 101 - Serialization 
● JSON Serialization 
● Binary serialization 
○ int, long - variable length, Zig-zag encoding 
○ float, double - 4,8 bytes respectively 
○ string - long followed by UTF-8 bytes 
○ map, array - unlimited size, use blocks 
○ Unions - long index of the type
Avro 101 - Generic vs. Specific 
● SpecificDatumReader/Writer <T> 
○ Static types 
○ Code Generation: Java, C, C++, C#, Python, Ruby... 
● GenericDatumReader/Writer <GenericRecord> 
○ Dynamic types & access
Avro 101 - Schema Resolution 
● Writer schema must be always provided for decoding 
● Reader can use its own schema 
● Allows the reader and writer schema to evolve 
independently
Avro vs... 
Technologies Protobuf Thrift Avro 
Created 2001 (2008) 2007 2009 
Creator / Maintainer Google / Google Facebook / Apache 
Doug cutting / 
Apache 
Schema evolution Field Tag Field Tag Schema 
Static/Dynamic Yes/No Yes/No Yes/Yes 
Hadoop support No No Yes 
RPC No Yes Yes 
Used by Google Facebook, Cassandra Hadoop, Liveperson 
Lang support Good Great Good
Backward & Forward Compatibility 
Avro schema evolution 
● Avro supports resolution between two schemes 
● Need to follow a set of rules: 
● Every field must have a default value 
● A field can be added (make sure to put a default value) 
● Field types can not be changed (add a new field 
instead) 
● enum symbols can be added but never removed
Avro IDL - LivePerson Event 
/** Base for all LivePerson Events 
*/ 
@namespace("com.liveperson.global") 
record LPEvent { 
/** Common Header of the event */ 
CommonHeader header = null; 
/** Logical entity details participating in this event - Visitor, Agent, etc... */ 
array<Participant> participants = null; 
/** Holding specific platform info as node name (machine) cluster Id etc... */ 
PlatformHeader platformSpecificHeader = null; 
/** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/ 
union {null, AuditingHeader } auditingHeader = null; 
/** The event body */ 
EventBody eventBody = null; 
}
Wait there is (much) more! 
M/R 
Migdalor
How good does it work? 
● Cyber Monday 2013 (one day) 
o More than 320,000 events per second 
o 7 Storm topologies consuming the events seconds from 
real time 
o 2TB of data saved to Hadoop 
● 2014 preparation: 
o x2 number of events per second to ~640,000
So how did we do it? 
1. Use an event driven system, don’t use direct APIs 
2. Create a unified schema for all events 
3. Use Avro to implement the schema 
4. Add some supporting infrastructure
Questions 
???? 
event 
evento 
事件 
घटना 
حدث 
ארוע 
событие
Amihay Zer-Kavod 
You can contact me at: 
amihayz@liveperson.com 
LivePerson is hiring!
Thank You

More Related Content

What's hot

Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
Sergey Podolsky
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformHoward Mansell
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
오석 한
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
Aniruddha Chakrabarti
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
ryutenchi
 
Extending the Xbase Typesystem
Extending the Xbase TypesystemExtending the Xbase Typesystem
Extending the Xbase Typesystem
Sebastian Zarnekow
 
D programming language
D programming languageD programming language
D programming language
Jordan Open Source Association
 
Introduction to the rust programming language
Introduction to the rust programming languageIntroduction to the rust programming language
Introduction to the rust programming language
Nikolay Denev
 
Go Programming Language (Golang)
Go Programming Language (Golang)Go Programming Language (Golang)
Go Programming Language (Golang)
Ishin Vin
 
Introduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IOIntroduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IO
Liran Zvibel
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
Jorg Janke
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsAlex Tumanoff
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To Scala
Peter Maas
 
IO Streams, Files and Directories
IO Streams, Files and DirectoriesIO Streams, Files and Directories
IO Streams, Files and Directories
Krasimir Berov (Красимир Беров)
 
Dart ppt
Dart pptDart ppt
Dart ppt
Krishna Teja
 
Scala : language of the future
Scala : language of the futureScala : language of the future
Scala : language of the future
AnsviaLab
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scala
Ruslan Shevchenko
 
Experience protocol buffer on android
Experience protocol buffer on androidExperience protocol buffer on android
Experience protocol buffer on android
Richard Chang
 

What's hot (20)

Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
 
Extending the Xbase Typesystem
Extending the Xbase TypesystemExtending the Xbase Typesystem
Extending the Xbase Typesystem
 
D programming language
D programming languageD programming language
D programming language
 
Introduction to the rust programming language
Introduction to the rust programming languageIntroduction to the rust programming language
Introduction to the rust programming language
 
Go Programming Language (Golang)
Go Programming Language (Golang)Go Programming Language (Golang)
Go Programming Language (Golang)
 
Introduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IOIntroduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IO
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To Scala
 
IO Streams, Files and Directories
IO Streams, Files and DirectoriesIO Streams, Files and Directories
IO Streams, Files and Directories
 
Dart ppt
Dart pptDart ppt
Dart ppt
 
Scala : language of the future
Scala : language of the futureScala : language of the future
Scala : language of the future
 
DSLs in JavaScript
DSLs in JavaScriptDSLs in JavaScript
DSLs in JavaScript
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scala
 
Experience protocol buffer on android
Experience protocol buffer on androidExperience protocol buffer on android
Experience protocol buffer on android
 

Viewers also liked

Avro Data | Washington DC HUG
Avro Data | Washington DC HUGAvro Data | Washington DC HUG
Avro Data | Washington DC HUGCloudera, Inc.
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Hisham Mardam-Bey
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
Eric Wendelin
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
GetInData
 
Scala Days NYC 2016
Scala Days NYC 2016Scala Days NYC 2016
Scala Days NYC 2016
Martin Odersky
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Koreamichroeder
 
Stress Management!
Stress Management!Stress Management!
Stress Management!
Andeel Ali
 
Brochure gjav.indd
Brochure gjav.inddBrochure gjav.indd
Brochure gjav.inddGJAV
 
D.condicion juridica procesal de los extranjeros
D.condicion juridica procesal de los extranjerosD.condicion juridica procesal de los extranjeros
D.condicion juridica procesal de los extranjeros
Universidad de Sonora
 
Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014
Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014
Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014
Food Technical Consulting
 
Screaming Brain Studio Art Samples set1
Screaming Brain Studio Art Samples set1Screaming Brain Studio Art Samples set1
Screaming Brain Studio Art Samples set1screamingbrain
 
Cis 512 Week 4 Assignment
Cis 512 Week 4 AssignmentCis 512 Week 4 Assignment
Cis 512 Week 4 AssignmentVwilliams621
 
Pm glassy blue_earth
Pm glassy blue_earthPm glassy blue_earth
Pm glassy blue_earthThomas Jensen
 

Viewers also liked (20)

3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Avro Data | Washington DC HUG
Avro Data | Washington DC HUGAvro Data | Washington DC HUG
Avro Data | Washington DC HUG
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 
Avro
AvroAvro
Avro
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Scala Days NYC 2016
Scala Days NYC 2016Scala Days NYC 2016
Scala Days NYC 2016
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Korea
 
Jiuzhou
JiuzhouJiuzhou
Jiuzhou
 
Opensat
OpensatOpensat
Opensat
 
Stress Management!
Stress Management!Stress Management!
Stress Management!
 
Brochure gjav.indd
Brochure gjav.inddBrochure gjav.indd
Brochure gjav.indd
 
D.condicion juridica procesal de los extranjeros
D.condicion juridica procesal de los extranjerosD.condicion juridica procesal de los extranjeros
D.condicion juridica procesal de los extranjeros
 
Elnet
ElnetElnet
Elnet
 
Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014
Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014
Fast Food/Casual QSR Restaurant Tour of Sydney Australia 2014
 
Screaming Brain Studio Art Samples set1
Screaming Brain Studio Art Samples set1Screaming Brain Studio Art Samples set1
Screaming Brain Studio Art Samples set1
 
Cis 512 Week 4 Assignment
Cis 512 Week 4 AssignmentCis 512 Week 4 Assignment
Cis 512 Week 4 Assignment
 
Pm glassy blue_earth
Pm glassy blue_earthPm glassy blue_earth
Pm glassy blue_earth
 

Similar to Apache Avro in LivePerson [Hebrew]

Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
BDPA Education and Technology Foundation
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Jean-Baptiste Onofré
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
Steve Caron
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
Tim Burks
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
Adam Doyle
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Splunk
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
Runtime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya RathoreRuntime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya RathoreEsha Yadav
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
Itzik Kotler
 
MacSysAdmin Conference 2019 - Logging
MacSysAdmin Conference 2019 - Logging MacSysAdmin Conference 2019 - Logging
MacSysAdmin Conference 2019 - Logging
Henry Stamerjohann
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, AnalyticsMongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, AnalyticsMongoDB
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
Gera Shegalov
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2
 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
MapR Technologies
 

Similar to Apache Avro in LivePerson [Hebrew] (20)

Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Runtime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya RathoreRuntime Environment Of .Net Divya Rathore
Runtime Environment Of .Net Divya Rathore
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
 
MacSysAdmin Conference 2019 - Logging
MacSysAdmin Conference 2019 - Logging MacSysAdmin Conference 2019 - Logging
MacSysAdmin Conference 2019 - Logging
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, AnalyticsMongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, Analytics
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
 

More from LivePerson

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
LivePerson
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
LivePerson
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
LivePerson
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
LivePerson
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
LivePerson
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
LivePerson
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
LivePerson
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
LivePerson
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
LivePerson
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
LivePerson
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
LivePerson
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
LivePerson
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
LivePerson
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
LivePerson
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
LivePerson
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
LivePerson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
LivePerson
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
How can A/B testing go wrong?
How can A/B testing go wrong?How can A/B testing go wrong?
How can A/B testing go wrong?
LivePerson
 

More from LivePerson (20)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
How can A/B testing go wrong?
How can A/B testing go wrong?How can A/B testing go wrong?
How can A/B testing go wrong?
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Apache Avro in LivePerson [Hebrew]

  • 1. Apache Avro in LivePerson Collecting and saving data is easy keeping it consistent is tough Sandwich club, Sep 2014 Amihay Zer-Kavod, Software Architect
  • 2. Who am I? Amihay Zer-Kavod Software Architect Been in software Since 1989
  • 4. Communication & Meaning ● Consistent but decoupled communication between services, such as: o Monitoring, Interaction o Predictive, Sentiment o RT Reporting & Analysis o Visitor History event evento 事件 घटना حدث ארוע событие ● Consistent meaning over time o BigData Store (Hadoop) o Offline Reporting & Analysis
  • 5. What shouldn’t we use? Don’t use Direct APIs! They are completely wrong for this subject: • They produce too much coupling between services • APIs are synchronous by nature • Adds irrelevant complexity to the called service
  • 6. What is needed? The Message is the API! ● A unified event model (schema) for all reported events ● Management tools for the unified schema ● Tools for sending events over the wire ● Tools for reading/writing event in big data ● Backward and forward compatibility
  • 7. The Event model From generic to specific structure with: • Common header - all common data to all events • Logical Entities - common header to all logical entities (such as Visitor) • Dynamic Specific headers • Specific Event body
  • 8. Apache Avro to the rescue ● Avro - a schema based serialization/deserialization framework ● Avro idl - schema definition language ● Avro file - Hadoop integration ● Avro schema resolution ● Apache Avro created by Doug Cutting
  • 9. Avro 101 - Data Structures ● Rich data structures ○ Primitives ■ null, int, long, boolean, float, double, bytes, string ○ Records ○ Map (string, Schema) ○ Arrays (Schema) ○ Enums ○ Unions
  • 10. Avro 101 - JSON Schema { "type": "record", "name": "Event", "namespace": "com.liveperson.example", "doc": "Example event", "fields":[ { "name": "id", "type": "string", "default": "Unknown"}, { "name": "time", "type": "long", "default": -1}, { "name": "color", "type": { "type": "enum", "name": "Color", "symbols": ["NO_COLOR", "BLUE", "BLACK", "WHITE", "PINK"] }, "default": "NO_COLOR" } ] }
  • 11. Avro 101 - Avro IDL Schema @namespace("com.liveperson.example") enum Color { NO_COLOR, BLUE, BLACK, WHITE, PINK } /** Example event */ @namespace("com.liveperson.example") record Event { string id = “Unknown”; long time = -1; Color color = "NO_COLOR"; }
  • 12. Avro 101 - Serialization ● JSON Serialization ● Binary serialization ○ int, long - variable length, Zig-zag encoding ○ float, double - 4,8 bytes respectively ○ string - long followed by UTF-8 bytes ○ map, array - unlimited size, use blocks ○ Unions - long index of the type
  • 13. Avro 101 - Generic vs. Specific ● SpecificDatumReader/Writer <T> ○ Static types ○ Code Generation: Java, C, C++, C#, Python, Ruby... ● GenericDatumReader/Writer <GenericRecord> ○ Dynamic types & access
  • 14. Avro 101 - Schema Resolution ● Writer schema must be always provided for decoding ● Reader can use its own schema ● Allows the reader and writer schema to evolve independently
  • 15. Avro vs... Technologies Protobuf Thrift Avro Created 2001 (2008) 2007 2009 Creator / Maintainer Google / Google Facebook / Apache Doug cutting / Apache Schema evolution Field Tag Field Tag Schema Static/Dynamic Yes/No Yes/No Yes/Yes Hadoop support No No Yes RPC No Yes Yes Used by Google Facebook, Cassandra Hadoop, Liveperson Lang support Good Great Good
  • 16. Backward & Forward Compatibility Avro schema evolution ● Avro supports resolution between two schemes ● Need to follow a set of rules: ● Every field must have a default value ● A field can be added (make sure to put a default value) ● Field types can not be changed (add a new field instead) ● enum symbols can be added but never removed
  • 17. Avro IDL - LivePerson Event /** Base for all LivePerson Events */ @namespace("com.liveperson.global") record LPEvent { /** Common Header of the event */ CommonHeader header = null; /** Logical entity details participating in this event - Visitor, Agent, etc... */ array<Participant> participants = null; /** Holding specific platform info as node name (machine) cluster Id etc... */ PlatformHeader platformSpecificHeader = null; /** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/ union {null, AuditingHeader } auditingHeader = null; /** The event body */ EventBody eventBody = null; }
  • 18. Wait there is (much) more! M/R Migdalor
  • 19. How good does it work? ● Cyber Monday 2013 (one day) o More than 320,000 events per second o 7 Storm topologies consuming the events seconds from real time o 2TB of data saved to Hadoop ● 2014 preparation: o x2 number of events per second to ~640,000
  • 20. So how did we do it? 1. Use an event driven system, don’t use direct APIs 2. Create a unified schema for all events 3. Use Avro to implement the schema 4. Add some supporting infrastructure
  • 21. Questions ???? event evento 事件 घटना حدث ארוע событие
  • 22. Amihay Zer-Kavod You can contact me at: amihayz@liveperson.com LivePerson is hiring!