SlideShare a Scribd company logo
http://avro.apache.org
                                            Apache Avro
                         More Than Just A Serialization Framework

                                                        Jim Scott
                                         Lead Engineer / Architect




                                                             A ValueClick Company
Agenda

     • History / Overview

     • Serialization Framework

              • Supported Languages

              • Performance

     • Implementing Avro (Including Code Examples)

     • Avro with Maven

     • RPC (Including Code Examples)

     • Resources

     • Questions?




2   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
History / Overview




3   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
History / Overview

     Existing Serialization Frameworks

              • protobuf, thrift, avro, kryo, hessian, activemq-protobuf, scala, sbinary,
                google-gson, jackson/JSON, javolution, protostuff, woodstox, aalto, fast-
                infoset, xstream, java serialization, etc…

     Most popular frameworks

              • JAXB, Protocol Buffers, Thrift

     Avro

              Created by Doug Cutting, the Creator of Hadoop

              • Data is always accompanied by a schema:

                             Support for dynamic typing--code generation is not required
                             Supports schema evolution
                             The data is not tagged resulting in smaller serialization size




4   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Serialization Framework




5   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Serialization Framework

     Avro Limitations

              • Map keys can only be Strings

     Avro Benefits

              • Interoperability

                            Can serialize into Avro/Binary or Avro/JSON
                            Supports reading and writing protobufs and thrift

              • Supports multiple languages

              • Rich data structures with a schema described via JSON

                            A compact, fast, binary data format.
                            A container file, to store persistent data (Schema ALWAYS available)
                            Remote procedure call (RPC).

              • Simple integration with dynamic languages (via the generic type)

                        Unlike other frameworks, an unknown schema is supported at runtime

              • Compressable and splittable by Hadoop MapReduce


6   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Supported Languages

       Implementation Core                                                                    Data file                              Codec             RPC
       C                                                 yes                                  yes                                    deflate           yes
       C++                                               yes                                  yes                                    ?                 yes
       C#                                                yes                                  no                                     n/a               no
       Java                                              yes                                  yes                                    deflate, snappy   yes
       Perl                                              yes                                  yes                                    deflate           no
       Python                                            yes                                  yes                                    deflate, snappy   yes
       Ruby                                              yes                                  yes                                    deflate           yes
       PHP                                               yes                                  yes                                    ?                 no


       Core: Parse JSON schema, read / write binary schema
       Data file: Read / write avro data files
       RPC: Over HTTP

       Source: https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages



7   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Framework - Performance
     Comparison Metrics


     Time to Serialize / Deserialize

              • Avro is not the fastest, but is in the top half of all frameworks

     Object Creation

              • Avro falls to the bottom, because it always uses UTF-8 for Strings. In
                normal use cases this is not a problem, as this test was just to compare
                object creation, not object reuse.

     Size of Serialized Objects (Compressed w/ deflate or nothing)

              • Avro is only bested by Kryo by about 1 byte




     Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2




8   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Framework - Performance
         Comparison Charts



                Size of serialized data                                                                                            Total time to serialize data


                                                                                           Avro




    Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2


9       Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Implementing Avro




10   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Framework - Types

      Generic

               • All avro records are represented by a generic attribute/value data structure. This
                 style is most useful for systems which dynamically process datasets based on
                 user-provided scripts. For example, a program may be passed a data file whose
                 schema has not been previously seen by the program and told to sort it by the
                 field named "city".

      Specific

               • Each Avro record corresponds to a different kind of object in the programming
                 language. For example, in Java, C and C++, a specific API would generate a
                 distinct class or struct definition for each record definition. This style is used for
                 programs written to process a specific schema. RPC systems typically use this.

      Reflect

               • Avro schemas are generated via reflection to correspond to existing
                 programming language data structures. This may be useful when converting an
                 existing codebase to use Avro with minimal modifications.



      Source: https://cwiki.apache.org/confluence/display/AVRO/Glossary



11   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Using Reflect Type

      Class<T> type =

                                 SomeObject.getClass();

      Schema schema =

                                 ReflectData.AllowNull.get().getSchema(type);

      DataFileWriter writer =

                                 new DataFileWriter(new ReflectDatumWriter(schema));




12   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Using Specific Type

      Class<T> type =

                                 SomeObject.getClass();

      Schema schema =

                                 SpecificData.get().getSchema(type);

      DataFileWriter writer =

                                 new DataFileWriter(new SpecificDatumWriter(schema));




13   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Using the DataFileWriter

      Only one more thing to do and that is to tell this writer where to write...

           writer.create(schema, OutputStream);

      What if you want to append to an existing file instead of creating a new
       one?

           writer.appendTo(new File("Some File That exists"));

      Time to write...

           writer.append(object of type T);




14   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Don’t Forget About Reading
      Class<T> type =

                                 SomeObject.getClass();

      Schema schema =

                                 ReflectData.AllowNull.get().getSchema(type);
                                 SpecificData.get().getSchema(type);

      DatumReader datumReader =

                                 new SpecificDatumReader(schema);
                                 new ReflectDatumReader(schema);

      DataFileStream reader =

                                 new DataFileStream(inputStream, datumReader);

      reader.iterator();

      Remember that compressed data? Reader reads it automatically!




15   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Defining a Specific Schema

      Create an Enum type: serverstate.avsc (name is arbitrary, extension is not)
           {"type":"enum",
           "namespace":"com.yourcompany.avro",
           "name":"ServerState",
           "symbols":[
                     "STARTING",
                     "IDLE",
                     "ACTIVE",
                     "STOPPING“,
                     "STOPPED“
           ]}

      Create an Exception type: wrongstate.avsc
           { "type":"error",
           "namespace":"com.yourcompany.avro",
           "name":“WrongServerStateException",
           "fields":[
                      {
                             "name":"message",
                             "type":"string“
                      }
           ]}



16   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Defining a Specific Schema
      Create a regular data object: historical.avsc

      { "type":"record",
         "namespace":"com.yourcompany.avro",
         "name":"NewHistoricalMessage",
         "aliases": ["com.yourcompany.avro.datatypes.HistoricalMessage"],
         "fields":[ {
                   "name":"dataSource",
                   "type":[
                            "null",
                            "string“
                   ]}
         }

      Aliases allow for schema evolution.

      All data objects that are generated are defined with simple JSON and the
        documentation is very straight forward.

      Source: http://avro.apache.org/docs/current/spec.html




17   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Maven




18   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Avro With Maven
     Maven Plugins

     • This plugin assists with the Maven build lifecycle (may not be necessary in all use cases)

       <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>build-helper-maven-plugin</artifactId>
        </plugin>

     • Compiles *.avdl, *.avpr, *.avsc, and *.genavro (define the goals accordingly)

       <plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
        </plugin>

     • Necessary for Avro to introspect generated rpc code (http://paranamer.codehaus.org/)

       <plugin>
                <groupId>com.thoughtworks.paranamer</groupId>
                <artifactId>paranamer-maven-plugin</artifactId>
        </plugin>




19    Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
RPC




20   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
RPC

      How to utilize an Avro RPC Server

      • Define the Protocol

      • Datatypes passed via RPC require use of specific types

      • An implementation of the interface generated by the protocol

      • Create and start an instance of an Avro RPC Server in Java

      • Create a client based on the interface generated by the protocol




21   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Define the Protocol

      • Create an AVDL file: historytracker.avdl (name is arbitrary, but the extension
        is not)

           @namespace("com.yourcompany.rpc")
           protocol HistoryTracker {
            import schema "historical.avsc";
            import schema "serverstate.avsc";
            import schema "wrongstate.avsc“;
            void somethingHappened(
                   com.yourcompany.avro.NewHistoricalMessage Item) oneway;

               /**
                * You can add comments
                */
               com.yourcompany.avro.ServerState getState() throws
                      com.yourcompany.avro.WrongServerStateException;
           }

      .




22   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Create an RPC Server

      Creating a server is fast and easy…
           InetSocketAddress address =
                  new InetSocketAddress(hostname, port);
           Responder responder =
                  new SpecificResponder(HistoryTracker.class, HistoryTrackerImpl);
           Server avroServer =
                  new NettyServer(responder, address);
           avroServer.start();


      • The HistoryTracker is the interface generated from the AVDL file

      • The HistoryTrackerImpl is an implementation of the HistoryTracker

      • There are other service implementations beyond Netty, e.g. HTTP




23   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Create an RPC Client

      Creating a client is easier than creating a server…
           InetSocketAddress address =
                  new InetSocketAddress(hostname, port);
           Transceiver transceiver =
                  new NettyTransceiver(address);
           Object<rpcInterface> client =
                  SpecificRequestor.getClient(HistoryTracker.class, transceiver);


      • The HistoryTracker is the interface generated from the AVDL file

      • There are other service implementations beyond Netty, e.g. HTTP




24   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Resources




25   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Resources

      References
      • Apache Website and Wiki
           http://avro.apache.org
           https://cwiki.apache.org/confluence/display/AVRO/Index

      • Benchmarking Serializaiton Frameworks
           http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2

      • An Introduction to Avro (Chris Cooper)
           http://files.meetup.com/1634302/CHUG-ApacheAvro.pdf

      Resources
      • Mailing List: user@avro.apache.org




26   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
Thanks for Attending
                                                                Questions?
                                                                              jscott@dotomi.com




27   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.

More Related Content

What's hot

Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
Cloudera, Inc.
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
Howard Mansell
 

What's hot (19)

Beyond JSON - An Introduction to FlatBuffers
Beyond JSON - An Introduction to FlatBuffersBeyond JSON - An Introduction to FlatBuffers
Beyond JSON - An Introduction to FlatBuffers
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache AiravataRESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
 
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 
Php
PhpPhp
Php
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
 
Data Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol BuffersData Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol Buffers
 
Experience protocol buffer on android
Experience protocol buffer on androidExperience protocol buffer on android
Experience protocol buffer on android
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
 
Php extensions
Php extensionsPhp extensions
Php extensions
 
Hack and HHVM
Hack and HHVMHack and HHVM
Hack and HHVM
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
Dart the better Javascript 2015
Dart the better Javascript 2015Dart the better Javascript 2015
Dart the better Javascript 2015
 
Php’s guts
Php’s gutsPhp’s guts
Php’s guts
 
Presentation of Python, Django, DockerStack
Presentation of Python, Django, DockerStackPresentation of Python, Django, DockerStack
Presentation of Python, Django, DockerStack
 

Similar to Avro - More Than Just a Serialization Framework - CHUG - 20120416

CRX Best practices
CRX Best practicesCRX Best practices
CRX Best practices
lisui0807
 

Similar to Avro - More Than Just a Serialization Framework - CHUG - 20120416 (20)

Graal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution PlatformGraal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution Platform
 
Avro
AvroAvro
Avro
 
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
 
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
 
CRX Best practices
CRX Best practicesCRX Best practices
CRX Best practices
 
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" EcosystemsPyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
 
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityDCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production Parity
 
3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
 
PHP - Introduction to PHP Fundamentals
PHP -  Introduction to PHP FundamentalsPHP -  Introduction to PHP Fundamentals
PHP - Introduction to PHP Fundamentals
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101
 
OSGi enRoute Unveiled - P Kriens
OSGi enRoute Unveiled - P KriensOSGi enRoute Unveiled - P Kriens
OSGi enRoute Unveiled - P Kriens
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
Reusando componentes Zope fuera de Zope
Reusando componentes Zope fuera de ZopeReusando componentes Zope fuera de Zope
Reusando componentes Zope fuera de Zope
 
Guglielmo iozzia - Google I/O extended dublin 2018
Guglielmo iozzia - Google  I/O extended dublin 2018Guglielmo iozzia - Google  I/O extended dublin 2018
Guglielmo iozzia - Google I/O extended dublin 2018
 
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
 
Intro Of Selenium
Intro Of SeleniumIntro Of Selenium
Intro Of Selenium
 
Spring Roo 1.0.0 Technical Deep Dive
Spring Roo 1.0.0 Technical Deep DiveSpring Roo 1.0.0 Technical Deep Dive
Spring Roo 1.0.0 Technical Deep Dive
 

More from Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
Chicago Hadoop Users Group
 

More from Chicago Hadoop Users Group (19)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 

Avro - More Than Just a Serialization Framework - CHUG - 20120416

  • 1. http://avro.apache.org Apache Avro More Than Just A Serialization Framework Jim Scott Lead Engineer / Architect A ValueClick Company
  • 2. Agenda • History / Overview • Serialization Framework • Supported Languages • Performance • Implementing Avro (Including Code Examples) • Avro with Maven • RPC (Including Code Examples) • Resources • Questions? 2 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 3. History / Overview 3 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 4. History / Overview Existing Serialization Frameworks • protobuf, thrift, avro, kryo, hessian, activemq-protobuf, scala, sbinary, google-gson, jackson/JSON, javolution, protostuff, woodstox, aalto, fast- infoset, xstream, java serialization, etc… Most popular frameworks • JAXB, Protocol Buffers, Thrift Avro Created by Doug Cutting, the Creator of Hadoop • Data is always accompanied by a schema: Support for dynamic typing--code generation is not required Supports schema evolution The data is not tagged resulting in smaller serialization size 4 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 5. Serialization Framework 5 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 6. Serialization Framework Avro Limitations • Map keys can only be Strings Avro Benefits • Interoperability Can serialize into Avro/Binary or Avro/JSON Supports reading and writing protobufs and thrift • Supports multiple languages • Rich data structures with a schema described via JSON A compact, fast, binary data format. A container file, to store persistent data (Schema ALWAYS available) Remote procedure call (RPC). • Simple integration with dynamic languages (via the generic type) Unlike other frameworks, an unknown schema is supported at runtime • Compressable and splittable by Hadoop MapReduce 6 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 7. Supported Languages Implementation Core Data file Codec RPC C yes yes deflate yes C++ yes yes ? yes C# yes no n/a no Java yes yes deflate, snappy yes Perl yes yes deflate no Python yes yes deflate, snappy yes Ruby yes yes deflate yes PHP yes yes ? no Core: Parse JSON schema, read / write binary schema Data file: Read / write avro data files RPC: Over HTTP Source: https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages 7 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 8. Framework - Performance Comparison Metrics Time to Serialize / Deserialize • Avro is not the fastest, but is in the top half of all frameworks Object Creation • Avro falls to the bottom, because it always uses UTF-8 for Strings. In normal use cases this is not a problem, as this test was just to compare object creation, not object reuse. Size of Serialized Objects (Compressed w/ deflate or nothing) • Avro is only bested by Kryo by about 1 byte Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 8 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 9. Framework - Performance Comparison Charts Size of serialized data Total time to serialize data Avro Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 9 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 10. Implementing Avro 10 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 11. Framework - Types Generic • All avro records are represented by a generic attribute/value data structure. This style is most useful for systems which dynamically process datasets based on user-provided scripts. For example, a program may be passed a data file whose schema has not been previously seen by the program and told to sort it by the field named "city". Specific • Each Avro record corresponds to a different kind of object in the programming language. For example, in Java, C and C++, a specific API would generate a distinct class or struct definition for each record definition. This style is used for programs written to process a specific schema. RPC systems typically use this. Reflect • Avro schemas are generated via reflection to correspond to existing programming language data structures. This may be useful when converting an existing codebase to use Avro with minimal modifications. Source: https://cwiki.apache.org/confluence/display/AVRO/Glossary 11 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 12. Using Reflect Type Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new ReflectDatumWriter(schema)); 12 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 13. Using Specific Type Class<T> type = SomeObject.getClass(); Schema schema = SpecificData.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new SpecificDatumWriter(schema)); 13 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 14. Using the DataFileWriter Only one more thing to do and that is to tell this writer where to write... writer.create(schema, OutputStream); What if you want to append to an existing file instead of creating a new one? writer.appendTo(new File("Some File That exists")); Time to write... writer.append(object of type T); 14 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 15. Don’t Forget About Reading Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); SpecificData.get().getSchema(type); DatumReader datumReader = new SpecificDatumReader(schema); new ReflectDatumReader(schema); DataFileStream reader = new DataFileStream(inputStream, datumReader); reader.iterator(); Remember that compressed data? Reader reads it automatically! 15 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 16. Defining a Specific Schema Create an Enum type: serverstate.avsc (name is arbitrary, extension is not) {"type":"enum", "namespace":"com.yourcompany.avro", "name":"ServerState", "symbols":[ "STARTING", "IDLE", "ACTIVE", "STOPPING“, "STOPPED“ ]} Create an Exception type: wrongstate.avsc { "type":"error", "namespace":"com.yourcompany.avro", "name":“WrongServerStateException", "fields":[ { "name":"message", "type":"string“ } ]} 16 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 17. Defining a Specific Schema Create a regular data object: historical.avsc { "type":"record", "namespace":"com.yourcompany.avro", "name":"NewHistoricalMessage", "aliases": ["com.yourcompany.avro.datatypes.HistoricalMessage"], "fields":[ { "name":"dataSource", "type":[ "null", "string“ ]} } Aliases allow for schema evolution. All data objects that are generated are defined with simple JSON and the documentation is very straight forward. Source: http://avro.apache.org/docs/current/spec.html 17 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 18. Maven 18 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 19. Avro With Maven Maven Plugins • This plugin assists with the Maven build lifecycle (may not be necessary in all use cases) <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> </plugin> • Compiles *.avdl, *.avpr, *.avsc, and *.genavro (define the goals accordingly) <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> </plugin> • Necessary for Avro to introspect generated rpc code (http://paranamer.codehaus.org/) <plugin> <groupId>com.thoughtworks.paranamer</groupId> <artifactId>paranamer-maven-plugin</artifactId> </plugin> 19 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 20. RPC 20 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 21. RPC How to utilize an Avro RPC Server • Define the Protocol • Datatypes passed via RPC require use of specific types • An implementation of the interface generated by the protocol • Create and start an instance of an Avro RPC Server in Java • Create a client based on the interface generated by the protocol 21 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 22. Define the Protocol • Create an AVDL file: historytracker.avdl (name is arbitrary, but the extension is not) @namespace("com.yourcompany.rpc") protocol HistoryTracker { import schema "historical.avsc"; import schema "serverstate.avsc"; import schema "wrongstate.avsc“; void somethingHappened( com.yourcompany.avro.NewHistoricalMessage Item) oneway; /** * You can add comments */ com.yourcompany.avro.ServerState getState() throws com.yourcompany.avro.WrongServerStateException; } . 22 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 23. Create an RPC Server Creating a server is fast and easy… InetSocketAddress address = new InetSocketAddress(hostname, port); Responder responder = new SpecificResponder(HistoryTracker.class, HistoryTrackerImpl); Server avroServer = new NettyServer(responder, address); avroServer.start(); • The HistoryTracker is the interface generated from the AVDL file • The HistoryTrackerImpl is an implementation of the HistoryTracker • There are other service implementations beyond Netty, e.g. HTTP 23 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 24. Create an RPC Client Creating a client is easier than creating a server… InetSocketAddress address = new InetSocketAddress(hostname, port); Transceiver transceiver = new NettyTransceiver(address); Object<rpcInterface> client = SpecificRequestor.getClient(HistoryTracker.class, transceiver); • The HistoryTracker is the interface generated from the AVDL file • There are other service implementations beyond Netty, e.g. HTTP 24 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 25. Resources 25 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 26. Resources References • Apache Website and Wiki http://avro.apache.org https://cwiki.apache.org/confluence/display/AVRO/Index • Benchmarking Serializaiton Frameworks http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 • An Introduction to Avro (Chris Cooper) http://files.meetup.com/1634302/CHUG-ApacheAvro.pdf Resources • Mailing List: user@avro.apache.org 26 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 27. Thanks for Attending Questions? jscott@dotomi.com 27 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.