3 apache-avro


Published on

Published in: Technology

3 apache-avro

  1. 1. Apache Avro Zafar GilaniMuhammad Adnan Khan Hui Shang
  2. 2. Outline• Overview• Comparison• Specification• SASL profile and usage• References
  3. 3. Overview• A data serialization system.• An RPC framework.• For: storage & comm.• Purpose: – Provide rich data structures. – A compact and fast binary data format. – Simple integration with dynamic languages.
  4. 4. Overview• Avro uses JSON for Interface Description Language (IDL). – To specify data types. – To specify protocols.• Review: JavaScript Object Notation is just a light-weight text-based standard for data interchange.
  5. 5. Why the need for Avro?• Primary usage in Hadoop, provides standard: 1. Serialization format for persistent data. 2. Wire format for communication .. • .. among Hadoop nodes. • .. from client programs to Hadoop services.
  6. 6. Overview• Avro relies on schemas. – Schema stored with data. – Each datum written with no per-value overheads. • Thus serialization is fast and small.• Avro in RPC: – Schema exchange during client-server handshake. – Correspondence in fields can be easily resolved.
  7. 7. APIs• Supporting API for: – Java –C – C++ – C# – Python – Ruby
  8. 8. Comparison with other systems• Avro vs. Protobuf and Thrift.• A quick note about Thrift: – Initially developed at Facebook by a Google intern. – Closer to Google’s protobuf.
  9. 9. Comparison with other systems Avro Google protobuf ThriftImplementation Hmm.. Cleaner  Hmm..Error handling Complex Simple OKExtensibility Hmm.. Richer OKCompatibility Java, C, C++, C#, That and much About the same as Python and Ruby more such as protobuf Adobe Actionscript, Microsoft Silverlight, etc.
  10. 10. Specification• Schema represented in one of: – JSON string, naming a defined type. – JSON object of the form: • {"type": "typeName" ...attributes...} – JSON array• Primitive types: null, boolean, int, long, float, double, bytes, string – {"type": "string"}• Complex types: records, enums, arrays, maps, unions, fixed
  11. 11. Specification, example protocol{ "namespace": "com.acme", "protocol": "HelloWorld", "doc": "Protocol Greetings", "types": [ {"name": "Greeting", "type": "record", "fields": [ {"name": "message", "type": "string"}]}, {"name": "Curse", "type": "error", "fields": [ {"name": "message", "type": "string"}]} ], "messages": { "hello": { "doc": "Say hello.", "request": [{"name": "greeting", "type": "Greeting" }], "response": "Greeting", "errors": ["Curse"] } }}
  12. 12. SASL profile• Simple Authentication and Security Layer.• Provides a framework for – Authentication. – Security of network protocols.
  13. 13. SASL usage• Negotiation procedure to use connection- oriented Avro RPC: – 0: START Used in a clients initial message. – 1: CONTINUE Used while negotiation is ongoing. – 2: FAIL Terminates negotiation unsuccessfully. – 3: COMPLETE Terminates negotiation sucessfully.
  14. 14. References1. Apache Avro, http://avro.apache.org/docs/current/2. Google protocol buffers vs Apache Avro, http://www.sammur.com/?p=363. Avro vs Thrift, http://tech.puredanger.com/2011/05/27/serializ ation-comparison/4. SASL, http://avro.apache.org/docs/current/sasl.html
  15. 15. Apache Avro Zafar GilaniMuhammad Adnan Khan Hui Shang