8. Serialization is the process of translating data structures or objects state into
binary or textual form to transport the data over network or to store on some
persistent storage. Once the data is transported over network or retrieved from
the persistent storage, it needs to be desterilized again.
28/ 8
9. Avro is a serialization framework developed within Apache's Hadoop
project.
Uses JSON based schemas
Uses RPC calls to send data
Schema's sent during data exchange
convert unstructured data into a structured way using schemas
First release apache Avro on 2009
28/ 9
10. a data serialization and RPC library, to help improve data in Hadoop
Ecosystem.
1. Interchange
2. interoperability
3. versioning
28/ 10
11. Rich data structures.
A compact, fast data format.
No need of Code generation for accessing the data.
Schema Evolution: “Data models evolve over time”.
The output in addition to being serialized, can be divided into
different part
Avro API's exist for these languages Java, C, C++, C#, Python and Ruby.
28/ 11
16. To use Avro, you need to follow the given workflow :
Step 1 − Create schemas.
Step 2 − Read the schemas into your program. It is done in two ways:
By Generating a Class Corresponding to Schema
By Using Parsers Library
Step 3 − Serialize the data using the serialization API provided for
Avro, which is found in the package org.apache.avro.specific.
Step 4 − Deserialize the data using deserialization API provided for
Avro, which is found in the package org.apache.avro.specific.
28/ 16
17. Create an Avro schema as shown below and save it as emp.avsc.
28/ 17
19. Org.apache.avro.specific
SpecificDatumWriter: It implements the DatumWriter interface which converts
Java objects into an in-memory serialized format.
DataFileWriter:This class writes a sequence serialized records of dat aconforming
to a schema, along with the schema in a file.
SpecificDatumReader: It implements the DatumReader interface which reads the
data of a schema and determines in-memory data representation.
DataFileReader: This class provides access to files written with DataFileWriter.
Org.apache.avro
Schema.Parser: This class is a parser for JSON-format schemas.
GenericRecord
28/ 19