Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache Avro    Zafar GilaniMuhammad Adnan Khan     Hui Shang
Outline•   Overview•   Comparison•   Specification•   SASL profile and usage•   References
Overview•   A data serialization system.•   An RPC framework.•   For: storage & comm.•   Purpose:    – Provide rich data s...
Overview• Avro uses JSON for Interface Description  Language (IDL).  – To specify data types.  – To specify protocols.• Re...
Why the need for Avro?• Primary usage in Hadoop, provides standard:  1. Serialization format for persistent data.  2. Wire...
Overview• Avro relies on schemas.  – Schema stored with data.  – Each datum written with no per-value overheads.     • Thu...
APIs• Supporting API for:  – Java  –C  – C++  – C#  – Python  – Ruby
Comparison with other systems• Avro vs. Protobuf and Thrift.• A quick note about Thrift:  – Initially developed at Faceboo...
Comparison with other systems                 Avro                Google protobuf       ThriftImplementation   Hmm..      ...
Specification• Schema represented in one of:   – JSON string, naming a defined type.   – JSON object of the form:      • {...
Specification, example protocol{    "namespace": "com.acme",    "protocol": "HelloWorld",    "doc": "Protocol Greetings", ...
SASL profile• Simple Authentication and Security Layer.• Provides a framework for  – Authentication.  – Security of networ...
SASL usage• Negotiation procedure to use connection-  oriented Avro RPC:  – 0: START Used in a clients initial message.  –...
References1. Apache Avro,   http://avro.apache.org/docs/current/2. Google protocol buffers vs Apache Avro,   http://www.sa...
Apache Avro    Zafar GilaniMuhammad Adnan Khan     Hui Shang
Upcoming SlideShare
Loading in …5
×

3 apache-avro

20,515 views

Published on

Published in: Technology
  • http://www.dbmanagement.info/Tutorials/Hadoop.htm #Hadoop #Avro #Cassandro #Drill #Flume Tutorial (Videos and Books)at $7.95
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

3 apache-avro

  1. 1. Apache Avro Zafar GilaniMuhammad Adnan Khan Hui Shang
  2. 2. Outline• Overview• Comparison• Specification• SASL profile and usage• References
  3. 3. Overview• A data serialization system.• An RPC framework.• For: storage & comm.• Purpose: – Provide rich data structures. – A compact and fast binary data format. – Simple integration with dynamic languages.
  4. 4. Overview• Avro uses JSON for Interface Description Language (IDL). – To specify data types. – To specify protocols.• Review: JavaScript Object Notation is just a light-weight text-based standard for data interchange.
  5. 5. Why the need for Avro?• Primary usage in Hadoop, provides standard: 1. Serialization format for persistent data. 2. Wire format for communication .. • .. among Hadoop nodes. • .. from client programs to Hadoop services.
  6. 6. Overview• Avro relies on schemas. – Schema stored with data. – Each datum written with no per-value overheads. • Thus serialization is fast and small.• Avro in RPC: – Schema exchange during client-server handshake. – Correspondence in fields can be easily resolved.
  7. 7. APIs• Supporting API for: – Java –C – C++ – C# – Python – Ruby
  8. 8. Comparison with other systems• Avro vs. Protobuf and Thrift.• A quick note about Thrift: – Initially developed at Facebook by a Google intern. – Closer to Google’s protobuf.
  9. 9. Comparison with other systems Avro Google protobuf ThriftImplementation Hmm.. Cleaner  Hmm..Error handling Complex Simple OKExtensibility Hmm.. Richer OKCompatibility Java, C, C++, C#, That and much About the same as Python and Ruby more such as protobuf Adobe Actionscript, Microsoft Silverlight, etc.
  10. 10. Specification• Schema represented in one of: – JSON string, naming a defined type. – JSON object of the form: • {"type": "typeName" ...attributes...} – JSON array• Primitive types: null, boolean, int, long, float, double, bytes, string – {"type": "string"}• Complex types: records, enums, arrays, maps, unions, fixed
  11. 11. Specification, example protocol{ "namespace": "com.acme", "protocol": "HelloWorld", "doc": "Protocol Greetings", "types": [ {"name": "Greeting", "type": "record", "fields": [ {"name": "message", "type": "string"}]}, {"name": "Curse", "type": "error", "fields": [ {"name": "message", "type": "string"}]} ], "messages": { "hello": { "doc": "Say hello.", "request": [{"name": "greeting", "type": "Greeting" }], "response": "Greeting", "errors": ["Curse"] } }}
  12. 12. SASL profile• Simple Authentication and Security Layer.• Provides a framework for – Authentication. – Security of network protocols.
  13. 13. SASL usage• Negotiation procedure to use connection- oriented Avro RPC: – 0: START Used in a clients initial message. – 1: CONTINUE Used while negotiation is ongoing. – 2: FAIL Terminates negotiation unsuccessfully. – 3: COMPLETE Terminates negotiation sucessfully.
  14. 14. References1. Apache Avro, http://avro.apache.org/docs/current/2. Google protocol buffers vs Apache Avro, http://www.sammur.com/?p=363. Avro vs Thrift, http://tech.puredanger.com/2011/05/27/serializ ation-comparison/4. SASL, http://avro.apache.org/docs/current/sasl.html
  15. 15. Apache Avro Zafar GilaniMuhammad Adnan Khan Hui Shang

×