SlideShare a Scribd company logo
1 of 14
Download to read offline
Apache AVRO
  What's new?
Philip Zeyliger, Cloudera
   (AVRO committer)

     Boston HUG
   January 19, 2009
What's AVRO?

 A data serialization system
 Includes:
     A schema language
     A compact serialized form
     An RPC framework
     A handful of APIs, in a handful of languages
 Goals:
     Cross-language
     Support for dynamic access
     Simple but expressive schema evolution

 Same "space" as Apache Thrift, Google Protocol Buffers,
 Binary JSON, and XDR. Subtle differences with all of them.
AVRO Protocols & Schemas
@namespace("org.apache.avro.demo")
protocol CurrencyConversion {
  enum Currency {
    USD,
    GBP,
    EUR,
    JPY
  }
  record Money {
    Currency currency;
    int amount;
  }
  error UnknownRateError {
    Currency currency;
  }
  Money convert(Money input, Currency targetCurrency)
    throws UnknownRateError;
  double rate(Currency input, Currency output) throws UnknownRateError;
}




              "genavro" IDL (AVRO-258)
$java -jar avro-tools-1.2.0-dev.jar genavro < demo.genavro   "messages" : {
{                                                               "convert" : {
  "protocol" : "CurrencyConversion",                              "request" : [ {
  "namespace" : "org.apache.avro.demo",                            "name" : "input",
  "types" : [ {                                                    "type" : "Money"
   "type" : "enum",                                               }, {
   "name" : "Currency",                                            "name" : "targetCurrency",
   "symbols" : [ "USD", "GBP", "EUR", "JPY" ]                      "type" : "Currency"
  }, {                                                            } ],
   "type" : "record",                                             "response" : "Money",
   "name" : "Money",                                              "errors" : [ "UnknownRateError" ]
   "fields" : [ {                                               },
     "name" : "currency",                                       "rate" : {
     "type" : "Currency"                                          "request" : [ {
   }, {                                                            "name" : "input",
     "name" : "amount",                                            "type" : "Currency"
     "type" : "int"                                               }, {
   }]                                                              "name" : "output",
  }, {                                                             "type" : "Currency"
   "type" : "error",                                              } ],
   "name" : "UnknownRateError",                                   "response" : "double",
   "fields" : [ {                                                 "errors" : [ "UnknownRateError" ]
     "name" : "currency",                                       }
     "type" : "Currency"                                      }
   }]                                                        }[
  } ],




             JSON Representation of Protocol and Schemas
Types

            primitive             complex
string                  record
bytes                   array
int & long              map: string -> T
float & double          union
boolean                 fixed<N>
null                    enum
Schema Evolution & Projection
         AVRO binary data never travels without its schema. This
         allows dynamic tooling.
         Writer's Schema and Reader's Schema may be different.
{       /* Writer */                { /* Reader */
     "type" : "record",               "type" : "record",
     "name" : "Person",               "name" : "Person",
     "fields" : [ {                   "fields" : [ {
       "name" : "first",                "name" : "first",
       "type" : "string"                "type" : "string"
     }, {                             }, {
       "name" : "sport",                "name" : "age",
       "type" : "string",               "type" : "int",
     }                                  "default": 0,
 }                                    }
                                    }

Serialized Data:                   Data presented to application:

    "Alice", "Ultimate Frisbee"     "Alice", 0
APIs

 Python
    Dynamic
 Java
    Specific (generated code)
    Generic (container-based)
    Reflection (induces schemas from classes)
 C
 C++
 Ruby
C API

char buf[64];
avro_writer_t writer = avro_writer_memory(buf, sizeof(buf));
avro_schema_t writers_schema = avro_schema_string();
avro_datum_t datum = avro_string("Hello, world!");
avro_write_data(writer, writers_schema, datum);

avro_reader_t reader = avro_reader_memory(buf, sizeof(buf));
avro_schema_t readers_schema = avro_schema_string();
avro_datum_t read_datum;
avro_read_data(reader, writers_schema, readers_schema, &read_datum);
Data File Format
(AVRO-160)

Features:
* Splittable
 (important for Hadoop!)
* Append only with same
  schema.
* Compression
* Arbitrary metadata
* Simple
Hadoop Integration
 Users
    AvroInputFormat/AvroOutputFormat (MR-815)
    Using AVRO in the shuffle (MR-1126)
        Note that AVRO schemas let you specify sort order;
        binary comparators are a thing of the past
    Many Writables can be AVRO+Reflection instead
    AVRO sort order leaves hand-writing RawComparators in
    the past; for Streaming, you now get fast comparators for
    free!
 Framework
    AVRO for Hadoop RPC (e.g., HDFS-982)
 Goals
    Open up protocols for cross-language use
avro-tools

Available tools:
   compile Generates Java code for the given schema.
fragtojson Renders a binary-encoded Avro datum as JSON.
  fromjson Reads JSON records and writes an Avro data file.
   genavro Generates a JSON schema from a GenAvro file
 getschema Prints out schema of an Avro data file.
    induce Induce a schema/protocol from Java class/interface.
jsontofrag Renders a JSON-encoded Avro datum as binary.
rpcreceive Opens an HTTP RPC Server and listens for one message.
   rpcsend Sends a single RPC message.
    tojson Dumps an Avro data file as JSON, one record per line.
1.3 to be released soon...

Good time to try it out!

What's evolving?
  Trying not to evolve the serialized format.
  APIs are evolving.
  Transports are evolving.
Obligatory Links

  Web page:
  http://hadoop.apache.org/avro/
  Mailing list:
  avro-user-subscribe@hadoop.apache.org
  Source repository:
  http://svn.apache.org/repos/asf/hadoop/avro/
Thanks!


    Questions?


    Philip Zeyliger
philip@cloudera.com

More Related Content

What's hot

Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
Sergio Bossa
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
Howard Mansell
 

What's hot (20)

Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForge
 
Introduction to the rust programming language
Introduction to the rust programming languageIntroduction to the rust programming language
Introduction to the rust programming language
 
Introduction to terrastore
Introduction to terrastoreIntroduction to terrastore
Introduction to terrastore
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To Scala
 
CBOR - The Better JSON
CBOR - The Better JSONCBOR - The Better JSON
CBOR - The Better JSON
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Extending the Xbase Typesystem
Extending the Xbase TypesystemExtending the Xbase Typesystem
Extending the Xbase Typesystem
 
DSLs in JavaScript
DSLs in JavaScriptDSLs in JavaScript
DSLs in JavaScript
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scala
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
 
Basics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examplesBasics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examples
 
Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 

Viewers also liked

Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
 

Viewers also liked (20)

3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Avro
AvroAvro
Avro
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NG
 
Notes on Netty baics
Notes on Netty baicsNotes on Netty baics
Notes on Netty baics
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Async Redux Actions With RxJS - React Rally 2016
Async Redux Actions With RxJS - React Rally 2016Async Redux Actions With RxJS - React Rally 2016
Async Redux Actions With RxJS - React Rally 2016
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Java concurrency - Thread pools
Java concurrency - Thread poolsJava concurrency - Thread pools
Java concurrency - Thread pools
 
Non blocking io with netty
Non blocking io with nettyNon blocking io with netty
Non blocking io with netty
 
Java SE 8 技術手冊第 15 章 - 通用API
Java SE 8 技術手冊第 15 章 - 通用APIJava SE 8 技術手冊第 15 章 - 通用API
Java SE 8 技術手冊第 15 章 - 通用API
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Netty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersNetty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and Buffers
 
Java SE 8 技術手冊第 9 章 - Collection與Map
Java SE 8 技術手冊第 9 章 - Collection與MapJava SE 8 技術手冊第 9 章 - Collection與Map
Java SE 8 技術手冊第 9 章 - Collection與Map
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 

Similar to Apache AVRO (Boston HUG, Jan 19, 2010)

json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
AmitSharma397241
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
Edward Capriolo
 

Similar to Apache AVRO (Boston HUG, Jan 19, 2010) (20)

JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're going
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
 
Writing Domain Specific Languages with JSON Schema
Writing Domain Specific Languages with JSON SchemaWriting Domain Specific Languages with JSON Schema
Writing Domain Specific Languages with JSON Schema
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
MongoDB + node.js で作るソーシャルゲーム
MongoDB + node.js で作るソーシャルゲームMongoDB + node.js で作るソーシャルゲーム
MongoDB + node.js で作るソーシャルゲーム
 
Introduction to Courier
Introduction to CourierIntroduction to Courier
Introduction to Courier
 
Copy/paste detector for source code on javascript
Copy/paste detector for source code on javascript Copy/paste detector for source code on javascript
Copy/paste detector for source code on javascript
 
RESTful APIs: Promises & lies
RESTful APIs: Promises & liesRESTful APIs: Promises & lies
RESTful APIs: Promises & lies
 
Example-driven Web API Specification Discovery
Example-driven Web API Specification DiscoveryExample-driven Web API Specification Discovery
Example-driven Web API Specification Discovery
 
Automatic discovery of Web API Specifications: an example-driven approach
Automatic discovery of Web API Specifications: an example-driven approachAutomatic discovery of Web API Specifications: an example-driven approach
Automatic discovery of Web API Specifications: an example-driven approach
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 
Building DSLs with Xtext - Eclipse Modeling Day 2009
Building DSLs with Xtext - Eclipse Modeling Day 2009Building DSLs with Xtext - Eclipse Modeling Day 2009
Building DSLs with Xtext - Eclipse Modeling Day 2009
 
BabelJS - James Kyle at Modern Web UI
BabelJS - James Kyle at Modern Web UIBabelJS - James Kyle at Modern Web UI
BabelJS - James Kyle at Modern Web UI
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
Json
JsonJson
Json
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Apache AVRO (Boston HUG, Jan 19, 2010)

  • 1. Apache AVRO What's new? Philip Zeyliger, Cloudera (AVRO committer) Boston HUG January 19, 2009
  • 2. What's AVRO? A data serialization system Includes: A schema language A compact serialized form An RPC framework A handful of APIs, in a handful of languages Goals: Cross-language Support for dynamic access Simple but expressive schema evolution Same "space" as Apache Thrift, Google Protocol Buffers, Binary JSON, and XDR. Subtle differences with all of them.
  • 3. AVRO Protocols & Schemas @namespace("org.apache.avro.demo") protocol CurrencyConversion { enum Currency { USD, GBP, EUR, JPY } record Money { Currency currency; int amount; } error UnknownRateError { Currency currency; } Money convert(Money input, Currency targetCurrency) throws UnknownRateError; double rate(Currency input, Currency output) throws UnknownRateError; } "genavro" IDL (AVRO-258)
  • 4. $java -jar avro-tools-1.2.0-dev.jar genavro < demo.genavro "messages" : { { "convert" : { "protocol" : "CurrencyConversion", "request" : [ { "namespace" : "org.apache.avro.demo", "name" : "input", "types" : [ { "type" : "Money" "type" : "enum", }, { "name" : "Currency", "name" : "targetCurrency", "symbols" : [ "USD", "GBP", "EUR", "JPY" ] "type" : "Currency" }, { } ], "type" : "record", "response" : "Money", "name" : "Money", "errors" : [ "UnknownRateError" ] "fields" : [ { }, "name" : "currency", "rate" : { "type" : "Currency" "request" : [ { }, { "name" : "input", "name" : "amount", "type" : "Currency" "type" : "int" }, { }] "name" : "output", }, { "type" : "Currency" "type" : "error", } ], "name" : "UnknownRateError", "response" : "double", "fields" : [ { "errors" : [ "UnknownRateError" ] "name" : "currency", } "type" : "Currency" } }] }[ } ], JSON Representation of Protocol and Schemas
  • 5. Types primitive complex string record bytes array int & long map: string -> T float & double union boolean fixed<N> null enum
  • 6. Schema Evolution & Projection AVRO binary data never travels without its schema. This allows dynamic tooling. Writer's Schema and Reader's Schema may be different. { /* Writer */ { /* Reader */ "type" : "record", "type" : "record", "name" : "Person", "name" : "Person", "fields" : [ { "fields" : [ { "name" : "first", "name" : "first", "type" : "string" "type" : "string" }, { }, { "name" : "sport", "name" : "age", "type" : "string", "type" : "int", } "default": 0, } } } Serialized Data: Data presented to application: "Alice", "Ultimate Frisbee" "Alice", 0
  • 7. APIs Python Dynamic Java Specific (generated code) Generic (container-based) Reflection (induces schemas from classes) C C++ Ruby
  • 8. C API char buf[64]; avro_writer_t writer = avro_writer_memory(buf, sizeof(buf)); avro_schema_t writers_schema = avro_schema_string(); avro_datum_t datum = avro_string("Hello, world!"); avro_write_data(writer, writers_schema, datum); avro_reader_t reader = avro_reader_memory(buf, sizeof(buf)); avro_schema_t readers_schema = avro_schema_string(); avro_datum_t read_datum; avro_read_data(reader, writers_schema, readers_schema, &read_datum);
  • 9. Data File Format (AVRO-160) Features: * Splittable (important for Hadoop!) * Append only with same schema. * Compression * Arbitrary metadata * Simple
  • 10. Hadoop Integration Users AvroInputFormat/AvroOutputFormat (MR-815) Using AVRO in the shuffle (MR-1126) Note that AVRO schemas let you specify sort order; binary comparators are a thing of the past Many Writables can be AVRO+Reflection instead AVRO sort order leaves hand-writing RawComparators in the past; for Streaming, you now get fast comparators for free! Framework AVRO for Hadoop RPC (e.g., HDFS-982) Goals Open up protocols for cross-language use
  • 11. avro-tools Available tools: compile Generates Java code for the given schema. fragtojson Renders a binary-encoded Avro datum as JSON. fromjson Reads JSON records and writes an Avro data file. genavro Generates a JSON schema from a GenAvro file getschema Prints out schema of an Avro data file. induce Induce a schema/protocol from Java class/interface. jsontofrag Renders a JSON-encoded Avro datum as binary. rpcreceive Opens an HTTP RPC Server and listens for one message. rpcsend Sends a single RPC message. tojson Dumps an Avro data file as JSON, one record per line.
  • 12. 1.3 to be released soon... Good time to try it out! What's evolving? Trying not to evolve the serialized format. APIs are evolving. Transports are evolving.
  • 13. Obligatory Links Web page: http://hadoop.apache.org/avro/ Mailing list: avro-user-subscribe@hadoop.apache.org Source repository: http://svn.apache.org/repos/asf/hadoop/avro/
  • 14. Thanks! Questions? Philip Zeyliger philip@cloudera.com