Your SlideShare is downloading. ×
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Avro - More Than Just a Serialization Framework - CHUG - 20120416

8,281
views

Published on

View the accompanying video on vimeo: https://vimeo.com/40776630

View the accompanying video on vimeo: https://vimeo.com/40776630

Published in: Technology, Education

0 Comments
14 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,281
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
14
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. http://avro.apache.org Apache Avro More Than Just A Serialization Framework Jim Scott Lead Engineer / Architect A ValueClick Company
  • 2. Agenda • History / Overview • Serialization Framework • Supported Languages • Performance • Implementing Avro (Including Code Examples) • Avro with Maven • RPC (Including Code Examples) • Resources • Questions?2 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 3. History / Overview3 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 4. History / Overview Existing Serialization Frameworks • protobuf, thrift, avro, kryo, hessian, activemq-protobuf, scala, sbinary, google-gson, jackson/JSON, javolution, protostuff, woodstox, aalto, fast- infoset, xstream, java serialization, etc… Most popular frameworks • JAXB, Protocol Buffers, Thrift Avro Created by Doug Cutting, the Creator of Hadoop • Data is always accompanied by a schema: Support for dynamic typing--code generation is not required Supports schema evolution The data is not tagged resulting in smaller serialization size4 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 5. Serialization Framework5 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 6. Serialization Framework Avro Limitations • Map keys can only be Strings Avro Benefits • Interoperability Can serialize into Avro/Binary or Avro/JSON Supports reading and writing protobufs and thrift • Supports multiple languages • Rich data structures with a schema described via JSON A compact, fast, binary data format. A container file, to store persistent data (Schema ALWAYS available) Remote procedure call (RPC). • Simple integration with dynamic languages (via the generic type) Unlike other frameworks, an unknown schema is supported at runtime • Compressable and splittable by Hadoop MapReduce6 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 7. Supported Languages Implementation Core Data file Codec RPC C yes yes deflate yes C++ yes yes ? yes C# yes no n/a no Java yes yes deflate, snappy yes Perl yes yes deflate no Python yes yes deflate, snappy yes Ruby yes yes deflate yes PHP yes yes ? no Core: Parse JSON schema, read / write binary schema Data file: Read / write avro data files RPC: Over HTTP Source: https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages7 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 8. Framework - Performance Comparison Metrics Time to Serialize / Deserialize • Avro is not the fastest, but is in the top half of all frameworks Object Creation • Avro falls to the bottom, because it always uses UTF-8 for Strings. In normal use cases this is not a problem, as this test was just to compare object creation, not object reuse. Size of Serialized Objects (Compressed w/ deflate or nothing) • Avro is only bested by Kryo by about 1 byte Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV28 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 9. Framework - Performance Comparison Charts Size of serialized data Total time to serialize data Avro Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV29 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 10. Implementing Avro10 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 11. Framework - Types Generic • All avro records are represented by a generic attribute/value data structure. This style is most useful for systems which dynamically process datasets based on user-provided scripts. For example, a program may be passed a data file whose schema has not been previously seen by the program and told to sort it by the field named "city". Specific • Each Avro record corresponds to a different kind of object in the programming language. For example, in Java, C and C++, a specific API would generate a distinct class or struct definition for each record definition. This style is used for programs written to process a specific schema. RPC systems typically use this. Reflect • Avro schemas are generated via reflection to correspond to existing programming language data structures. This may be useful when converting an existing codebase to use Avro with minimal modifications. Source: https://cwiki.apache.org/confluence/display/AVRO/Glossary11 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 12. Using Reflect Type Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new ReflectDatumWriter(schema));12 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 13. Using Specific Type Class<T> type = SomeObject.getClass(); Schema schema = SpecificData.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new SpecificDatumWriter(schema));13 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 14. Using the DataFileWriter Only one more thing to do and that is to tell this writer where to write... writer.create(schema, OutputStream); What if you want to append to an existing file instead of creating a new one? writer.appendTo(new File("Some File That exists")); Time to write... writer.append(object of type T);14 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 15. Don’t Forget About Reading Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); SpecificData.get().getSchema(type); DatumReader datumReader = new SpecificDatumReader(schema); new ReflectDatumReader(schema); DataFileStream reader = new DataFileStream(inputStream, datumReader); reader.iterator(); Remember that compressed data? Reader reads it automatically!15 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 16. Defining a Specific Schema Create an Enum type: serverstate.avsc (name is arbitrary, extension is not) {"type":"enum", "namespace":"com.yourcompany.avro", "name":"ServerState", "symbols":[ "STARTING", "IDLE", "ACTIVE", "STOPPING“, "STOPPED“ ]} Create an Exception type: wrongstate.avsc { "type":"error", "namespace":"com.yourcompany.avro", "name":“WrongServerStateException", "fields":[ { "name":"message", "type":"string“ } ]}16 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 17. Defining a Specific Schema Create a regular data object: historical.avsc { "type":"record", "namespace":"com.yourcompany.avro", "name":"NewHistoricalMessage", "aliases": ["com.yourcompany.avro.datatypes.HistoricalMessage"], "fields":[ { "name":"dataSource", "type":[ "null", "string“ ]} } Aliases allow for schema evolution. All data objects that are generated are defined with simple JSON and the documentation is very straight forward. Source: http://avro.apache.org/docs/current/spec.html17 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 18. Maven18 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 19. Avro With Maven Maven Plugins • This plugin assists with the Maven build lifecycle (may not be necessary in all use cases) <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> </plugin> • Compiles *.avdl, *.avpr, *.avsc, and *.genavro (define the goals accordingly) <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> </plugin> • Necessary for Avro to introspect generated rpc code (http://paranamer.codehaus.org/) <plugin> <groupId>com.thoughtworks.paranamer</groupId> <artifactId>paranamer-maven-plugin</artifactId> </plugin>19 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 20. RPC20 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 21. RPC How to utilize an Avro RPC Server • Define the Protocol • Datatypes passed via RPC require use of specific types • An implementation of the interface generated by the protocol • Create and start an instance of an Avro RPC Server in Java • Create a client based on the interface generated by the protocol21 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 22. Define the Protocol • Create an AVDL file: historytracker.avdl (name is arbitrary, but the extension is not) @namespace("com.yourcompany.rpc") protocol HistoryTracker { import schema "historical.avsc"; import schema "serverstate.avsc"; import schema "wrongstate.avsc“; void somethingHappened( com.yourcompany.avro.NewHistoricalMessage Item) oneway; /** * You can add comments */ com.yourcompany.avro.ServerState getState() throws com.yourcompany.avro.WrongServerStateException; } .22 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 23. Create an RPC Server Creating a server is fast and easy… InetSocketAddress address = new InetSocketAddress(hostname, port); Responder responder = new SpecificResponder(HistoryTracker.class, HistoryTrackerImpl); Server avroServer = new NettyServer(responder, address); avroServer.start(); • The HistoryTracker is the interface generated from the AVDL file • The HistoryTrackerImpl is an implementation of the HistoryTracker • There are other service implementations beyond Netty, e.g. HTTP23 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 24. Create an RPC Client Creating a client is easier than creating a server… InetSocketAddress address = new InetSocketAddress(hostname, port); Transceiver transceiver = new NettyTransceiver(address); Object<rpcInterface> client = SpecificRequestor.getClient(HistoryTracker.class, transceiver); • The HistoryTracker is the interface generated from the AVDL file • There are other service implementations beyond Netty, e.g. HTTP24 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 25. Resources25 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 26. Resources References • Apache Website and Wiki http://avro.apache.org https://cwiki.apache.org/confluence/display/AVRO/Index • Benchmarking Serializaiton Frameworks http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 • An Introduction to Avro (Chris Cooper) http://files.meetup.com/1634302/CHUG-ApacheAvro.pdf Resources • Mailing List: user@avro.apache.org26 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  • 27. Thanks for Attending Questions? jscott@dotomi.com27 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.