SlideShare a Scribd company logo
Downloaded from: justpaste.it/7y824
Apache AVRO - Data Serialization Framework
It makes sense to hook in at Serializer and Deserializer level and allow manufacturer and user developers to use
the convenient interface given by Kafka. Whereas the new Kafka versions allow Extended Serializers and
Extended Deserializers to access headers, we chose to use the schema identifier in the key and value of Kafka
data, instead of adding document headers.more info visit:big data hadoop course
Apache Avro
Apache Avro is a system for serializing data (and calling from a remote procedure). This uses a JSON document
to define data structures, called a schema. Most Apache Avro use is through either Generic Record or Specific
Record subclasses. The subclasses of the latter are Java classes created from Apache Avro schemas, while the
former can be used without prior knowledge of the data structure with which they operated.
If two schemes meet a set of compatibility requirements, data written with one schema (called the writer schema)
can be interpreted as if it had been written with the other (called the reader schema). Schemas have a canonical
form that has all the information unrelated to serialization, such as descriptions, stripped off to help verify
equivalence.
Versioned Schema and Provider Schema in Apache Avro
We need a one-to-one mapping between the schemes and their identifiers, as mentioned earlier. Referencing
systems by names is sometimes simpler. When a compatible schema is formed, a next version of the scheme can
be called. Thus we can use a tag, version pair to refer to schemas.
Let's call together a VersionedSchema with the schema, its identifier, name and version. This object could
possess additional metadata required by the application.
Versioned Schema, public class
Personal int I d final;
Private end name of string;
Personal edition of finale int;
Personal schematic finale;
Public versionedSchema(int I d, string name, field, scheme)
A.id = I d;
Name = Title
This.version = release;
This.schema = sketch;
}
To getName) (public string
Name Return;
}
Public function getVersion)
Launch version;
Public plot getSchema)
Back scheme;
}
Private int obtainId)
ID Return;
}
}
Why this interface is applied will be discussed in a future blog post called "Implementing a Schema
Store."
Public get(int I d) VersionedSchema;
Public get(String schemaName, int schemaVersion);
Public versioned diagram getMetadata(schema);
}
Serialisation of Generic Data in Apache Avro
First we need to find out which schema to use when serializing a record. Every record has got a method of
gettingSchema. Yet finding out the schema identifier could be time-consuming. Usually defining the schema at
initialization time is more effective. This can be achieved by identification directly, or by name and edition. In
addition, when producing multiple topics, we may want to set different schemes for different topics and find out the
schema from the name of
the topic provided as a parameter to the serialize(T, String) process. For our examples this rationale is omitted for
the sake of brevity and simplicity.
Private getSchema(T info, String topic)
The schemaProvider.getMetadata(data.getSchema)) (returns;
}
We need to store it in our file, with the schema in hand. Serializing the ID as part of the message gives us a
compact solution, because all the magic in the Serializer / Deserializer is happening. It also makes it possible to
integrate very quickly with other frameworks and libraries that already support Kafka and allow the user to use their
own serializer (such as Spark).
Using this method we write on the first four bytes the schema identifier first.
IOException {Private void writeSchemaId(ByteArrayOutputStream, int I d)
Try (os = new DataOutputStream(stream))
The Int(id) os.write;
}
}
Then we can create a DatumWriter and set the object to serial.
IOException {Private void writeSerializedAvro(ByteArrayOutputStream, T info, schema)
Encoder BinaryEncoder = EncoderFactory.get().binaryEncoder(stream, zero);
DatumWriter = new GenericDatumWriter<>(schema);
DatumWriter.write(Encoder, Data);
.flush) (encoder;
}
To bring it all together, we've implemented a generic serializer for data.
Public class Serializer implements Kafka Apache Avro Serializer
Personal schemaSchemaProvider;
@Surride
Public void configuration(Configure list, boolean isKey)
= SchemaUtils.getSchemaProvider(configs);
}
@Surride
Public byte] [serialize(Topic string, data T)
Seek to (ByteArrayOutputStream = new ByteArrayOutputStream))
Scheme VersionedSchema = getSchema(data, subject);
Id(stream, schema.getId));
WritingSerializedAvro(data source, schema.getSchema));
Return.toByteArray);
} (IOException e)
RuntimeException('Cannot serialize data, 'e);
}
}
IOException {...} Private void writeSchemaId(ByteArrayOutputStream stream, int I d) throws
IOException {...} Private void writeSerializedAvro(ByteArrayOutputStream line, T data, Schema
schema) throws
Private getSchema(T info, string topic) {...}
@Surride
Public close) (void
Check
SchemaProcessor.close);
} (Exception e) {catch
RuntimeException(e) throw new;
}
}
}
Deserialization of Standard Data in Apache Avro
Deserialization can work with a single schema (with which the schema data was written) but you can define a
specific schema for readers. The reader scheme has to be consistent with the schema with which the data has
been serialized, but need not be identical. We implemented scheme names for this purpose. We can now decide
that we want to use a specific version of a schema to read data. We read desired schema versions per schema
name at initialization time, and store
metadata for quick access in readerSchemasByName. Now we can read any record written with a compatible
schema version, as if it were written with the version specified.
@Surride
Public void configuration(Configure list, boolean isKey)
This.schemaProvider = shemaUtils.getSchemaProvider(configs);
= SchemaUtils.getVersionedSchemas(configs, schemaProvider);
}
When a record requires deserialization, we read the writer's scheme identifier first. This allows the reader
schema to be looked up by its name. We can create a GeneralDatumReader with both
schemes open, and read the record.
@Surride
Public GenericData. Record deserialize(Topic string, data byte])
Attempt to (ByteArrayInputStream = new ByteArrayInputStream(data))
In schemaId = read(stream);
VersionedSchema = schemaProvider.get(schemaId);
VersionedLeserSchema =
ReaderName(writerSchema.getName));
GenericData. Record = readAvroRecord(stream,
Schema.getSchema), (Schema.getSchema)) (reader;
Rückkehr avroRecord;
} (IOException e)
RuntimeException(e) throw new;
}
}
Private int readSchemaId(IOException) throws
Try(DataInputStream is = DataInputStream new(stream))
The.readInt) (return is;
}
}
About Specific Records in Apache Avro
There is more often than not one class that we would like to use for our records. This class is generated from an
Apache Avro scheme then usually. Apache Apache Avro offers tools forgenerating Java code from schemas. One
such device is plugin Apache Avro Maven. The generated classes have the schema from which they were created
at runtime. That simplifies and makes serialization and deserialization more successful. We can use the class to
find out about
the schema key to use for serialisation.
@Surride
Public void configuration(Configure list, boolean isKey)
= configs.get(isKey? KEY RECORD CLASSNAME: VALUE RECORD
CLASSNAME).toString);
Try schemaProvider = SchemaUtils.getSchemaProvider(configs))
Class recordClass = Class.forName;
SchemawriterSchema = new system
RecordClass.getClassLoader()).getSchema(recordClass);
= schemaProvider.getMetadata(writerSchema).getId);
} (Exception e) {catch
RuntimeException(e) throw new;
}
}
And we don't need the reasoning to decide the subject and the data schema. For write records, we use the
schema available inside the record class.
@Surride
Public T deserialize(Topic string, byte] [data)
Attempt to (ByteArrayInputStream = new ByteArrayInputStream(data))
In schemaId = read(stream);
VersionedSchema = schemaProvider.get(schemaId);
ReadAvroRecord(stream, writeSchema.getSchema), (readerSchema) returns;
} (IOException e)
RuntimeException(e) throw new;
}
}
IOException {Private T readAvroRecord(InputStream stream, Schema writerSchema, Schema
readerSchema)
DatumReader = new SpecificDatumReader<>(writerSchema, readerSchema);
DecoderBinaryDecoder = DecoderFactory.get(.binaryDecoder(stream, null);
Returns datumReader.read(null);
}
Likewise the reader scheme can be extracted from the class itself for deserialization. Deserialization logic is easier,
because the reader schema is set at the time of initialization and need not be looked up by the name of the
database.
Conclusion
I hope you reach to a conclusion about Apache Avro Deserialization. You can learn more
through big data online training

More Related Content

What's hot

Scala Reflection & Runtime MetaProgramming
Scala Reflection & Runtime MetaProgrammingScala Reflection & Runtime MetaProgramming
Scala Reflection & Runtime MetaProgramming
Meir Maor
 
Ad java prac sol set
Ad java prac sol setAd java prac sol set
Ad java prac sol set
Iram Ramrajkar
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
Jeevesh Pandey
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
.Net Framework 2 fundamentals
.Net Framework 2 fundamentals.Net Framework 2 fundamentals
.Net Framework 2 fundamentals
Harshana Weerasinghe
 
eXo SEA - JavaScript Introduction Training
eXo SEA - JavaScript Introduction TrainingeXo SEA - JavaScript Introduction Training
eXo SEA - JavaScript Introduction Training
Hoat Le
 
The CoFX Data Model
The CoFX Data ModelThe CoFX Data Model
The CoFX Data Model
Rainer Stropek
 
Java SE 8 - New Features
Java SE 8 - New FeaturesJava SE 8 - New Features
Java SE 8 - New Features
Naveen Hegde
 
A Taste of Dotty
A Taste of DottyA Taste of Dotty
A Taste of Dotty
Hermann Hueck
 
Java Programming - 08 java threading
Java Programming - 08 java threadingJava Programming - 08 java threading
Java Programming - 08 java threading
Danairat Thanabodithammachari
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
Net7
 
A Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to ScalaA Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to Scala
Derek Chen-Becker
 
Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...
Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...
Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...
Knoldus Inc.
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Akshay Nagpurkar
 
Kamaelia Protocol Walkthrough
Kamaelia Protocol WalkthroughKamaelia Protocol Walkthrough
Kamaelia Protocol Walkthrough
kamaelian
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup Edinburgh
Stuart Roebuck
 
ShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)SqlShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)Sql
Chema Alonso
 
Easy data-with-spring-data-jpa
Easy data-with-spring-data-jpaEasy data-with-spring-data-jpa
Easy data-with-spring-data-jpa
Staples
 

What's hot (20)

Scala Reflection & Runtime MetaProgramming
Scala Reflection & Runtime MetaProgrammingScala Reflection & Runtime MetaProgramming
Scala Reflection & Runtime MetaProgramming
 
Ad java prac sol set
Ad java prac sol setAd java prac sol set
Ad java prac sol set
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
.Net Framework 2 fundamentals
.Net Framework 2 fundamentals.Net Framework 2 fundamentals
.Net Framework 2 fundamentals
 
eXo SEA - JavaScript Introduction Training
eXo SEA - JavaScript Introduction TrainingeXo SEA - JavaScript Introduction Training
eXo SEA - JavaScript Introduction Training
 
camel-scala.pdf
camel-scala.pdfcamel-scala.pdf
camel-scala.pdf
 
The CoFX Data Model
The CoFX Data ModelThe CoFX Data Model
The CoFX Data Model
 
Java SE 8 - New Features
Java SE 8 - New FeaturesJava SE 8 - New Features
Java SE 8 - New Features
 
A Taste of Dotty
A Taste of DottyA Taste of Dotty
A Taste of Dotty
 
Java Programming - 08 java threading
Java Programming - 08 java threadingJava Programming - 08 java threading
Java Programming - 08 java threading
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
A Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to ScalaA Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to Scala
 
Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...
Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...
Play framework training by Neelkanth Sachdeva @ Scala traits event , New Delh...
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
 
Kamaelia Protocol Walkthrough
Kamaelia Protocol WalkthroughKamaelia Protocol Walkthrough
Kamaelia Protocol Walkthrough
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup Edinburgh
 
ShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)SqlShmooCon 2009 - (Re)Playing(Blind)Sql
ShmooCon 2009 - (Re)Playing(Blind)Sql
 
Easy data-with-spring-data-jpa
Easy data-with-spring-data-jpaEasy data-with-spring-data-jpa
Easy data-with-spring-data-jpa
 

Similar to Apache avro data serialization framework

Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop tools
alireza alikhani
 
JAXP
JAXPJAXP
Advance Java
Advance JavaAdvance Java
Advance Java
Vidyacenter
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
Federico Tomassetti
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using Stacker
MessageMedia
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Annotations
AnnotationsAnnotations
Annotations
Knoldus Inc.
 
Lab 1 Recursion  Introduction   Tracery (tracery.io.docx
Lab 1 Recursion  Introduction   Tracery (tracery.io.docxLab 1 Recursion  Introduction   Tracery (tracery.io.docx
Lab 1 Recursion  Introduction   Tracery (tracery.io.docx
smile790243
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Joe Stein
 
Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals
Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals
Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals
WebStackAcademy
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
Lukas Vlcek
 
import java-io-IOException- import java-nio-file-Files- import java-ni.docx
import java-io-IOException- import java-nio-file-Files- import java-ni.docximport java-io-IOException- import java-nio-file-Files- import java-ni.docx
import java-io-IOException- import java-nio-file-Files- import java-ni.docx
hendriciraida
 
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
Databricks
 
Spark sql
Spark sqlSpark sql
Spark sql
Zahra Eskandari
 
Stepping Up : A Brief Intro to Scala
Stepping Up : A Brief Intro to ScalaStepping Up : A Brief Intro to Scala
Stepping Up : A Brief Intro to Scala
Derek Chen-Becker
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
fanqstefan
 

Similar to Apache avro data serialization framework (20)

Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop tools
 
JAXP
JAXPJAXP
JAXP
 
JVM
JVMJVM
JVM
 
Advance Java
Advance JavaAdvance Java
Advance Java
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
 
Infrastructure as code deployed using Stacker
Infrastructure as code deployed using StackerInfrastructure as code deployed using Stacker
Infrastructure as code deployed using Stacker
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Annotations
AnnotationsAnnotations
Annotations
 
Lab 1 Recursion  Introduction   Tracery (tracery.io.docx
Lab 1 Recursion  Introduction   Tracery (tracery.io.docxLab 1 Recursion  Introduction   Tracery (tracery.io.docx
Lab 1 Recursion  Introduction   Tracery (tracery.io.docx
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals
Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals
Core Java Programming Language (JSE) : Chapter X - I/O Fundamentals
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
iOS Application Development
iOS Application DevelopmentiOS Application Development
iOS Application Development
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
import java-io-IOException- import java-nio-file-Files- import java-ni.docx
import java-io-IOException- import java-nio-file-Files- import java-ni.docximport java-io-IOException- import java-nio-file-Files- import java-ni.docx
import java-io-IOException- import java-nio-file-Files- import java-ni.docx
 
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Stepping Up : A Brief Intro to Scala
Stepping Up : A Brief Intro to ScalaStepping Up : A Brief Intro to Scala
Stepping Up : A Brief Intro to Scala
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
 
Ch23
Ch23Ch23
Ch23
 

More from veeracynixit

Servicenow it management tools
Servicenow it management toolsServicenow it management tools
Servicenow it management tools
veeracynixit
 
Android memory and performance optimization
Android memory and performance optimizationAndroid memory and performance optimization
Android memory and performance optimization
veeracynixit
 
Android memory and performance optimization
Android memory and performance optimizationAndroid memory and performance optimization
Android memory and performance optimization
veeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
veeracynixit
 
Ios actions and outlets
Ios actions and outletsIos actions and outlets
Ios actions and outlets
veeracynixit
 
New in Hadoop: You should know the Various File Format in Hadoop.
New in Hadoop: You should know the Various File Format in Hadoop.New in Hadoop: You should know the Various File Format in Hadoop.
New in Hadoop: You should know the Various File Format in Hadoop.
veeracynixit
 
Ios actions and outlets
Ios actions and outletsIos actions and outlets
Ios actions and outlets
veeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
veeracynixit
 
Android memory and performance optimization
Android memory and performance optimizationAndroid memory and performance optimization
Android memory and performance optimization
veeracynixit
 
Data presentation and reporting cognos tm1
Data presentation and reporting cognos tm1Data presentation and reporting cognos tm1
Data presentation and reporting cognos tm1
veeracynixit
 

More from veeracynixit (10)

Servicenow it management tools
Servicenow it management toolsServicenow it management tools
Servicenow it management tools
 
Android memory and performance optimization
Android memory and performance optimizationAndroid memory and performance optimization
Android memory and performance optimization
 
Android memory and performance optimization
Android memory and performance optimizationAndroid memory and performance optimization
Android memory and performance optimization
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Ios actions and outlets
Ios actions and outletsIos actions and outlets
Ios actions and outlets
 
New in Hadoop: You should know the Various File Format in Hadoop.
New in Hadoop: You should know the Various File Format in Hadoop.New in Hadoop: You should know the Various File Format in Hadoop.
New in Hadoop: You should know the Various File Format in Hadoop.
 
Ios actions and outlets
Ios actions and outletsIos actions and outlets
Ios actions and outlets
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Android memory and performance optimization
Android memory and performance optimizationAndroid memory and performance optimization
Android memory and performance optimization
 
Data presentation and reporting cognos tm1
Data presentation and reporting cognos tm1Data presentation and reporting cognos tm1
Data presentation and reporting cognos tm1
 

Recently uploaded

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Apache avro data serialization framework

  • 1. Downloaded from: justpaste.it/7y824 Apache AVRO - Data Serialization Framework It makes sense to hook in at Serializer and Deserializer level and allow manufacturer and user developers to use the convenient interface given by Kafka. Whereas the new Kafka versions allow Extended Serializers and Extended Deserializers to access headers, we chose to use the schema identifier in the key and value of Kafka data, instead of adding document headers.more info visit:big data hadoop course Apache Avro Apache Avro is a system for serializing data (and calling from a remote procedure). This uses a JSON document to define data structures, called a schema. Most Apache Avro use is through either Generic Record or Specific Record subclasses. The subclasses of the latter are Java classes created from Apache Avro schemas, while the former can be used without prior knowledge of the data structure with which they operated. If two schemes meet a set of compatibility requirements, data written with one schema (called the writer schema) can be interpreted as if it had been written with the other (called the reader schema). Schemas have a canonical form that has all the information unrelated to serialization, such as descriptions, stripped off to help verify equivalence. Versioned Schema and Provider Schema in Apache Avro We need a one-to-one mapping between the schemes and their identifiers, as mentioned earlier. Referencing systems by names is sometimes simpler. When a compatible schema is formed, a next version of the scheme can be called. Thus we can use a tag, version pair to refer to schemas. Let's call together a VersionedSchema with the schema, its identifier, name and version. This object could possess additional metadata required by the application. Versioned Schema, public class Personal int I d final; Private end name of string; Personal edition of finale int; Personal schematic finale; Public versionedSchema(int I d, string name, field, scheme) A.id = I d; Name = Title This.version = release; This.schema = sketch; } To getName) (public string Name Return; } Public function getVersion) Launch version; Public plot getSchema) Back scheme; } Private int obtainId) ID Return; } }
  • 2. Why this interface is applied will be discussed in a future blog post called "Implementing a Schema Store." Public get(int I d) VersionedSchema; Public get(String schemaName, int schemaVersion); Public versioned diagram getMetadata(schema); } Serialisation of Generic Data in Apache Avro First we need to find out which schema to use when serializing a record. Every record has got a method of gettingSchema. Yet finding out the schema identifier could be time-consuming. Usually defining the schema at initialization time is more effective. This can be achieved by identification directly, or by name and edition. In addition, when producing multiple topics, we may want to set different schemes for different topics and find out the schema from the name of the topic provided as a parameter to the serialize(T, String) process. For our examples this rationale is omitted for the sake of brevity and simplicity. Private getSchema(T info, String topic) The schemaProvider.getMetadata(data.getSchema)) (returns; } We need to store it in our file, with the schema in hand. Serializing the ID as part of the message gives us a compact solution, because all the magic in the Serializer / Deserializer is happening. It also makes it possible to integrate very quickly with other frameworks and libraries that already support Kafka and allow the user to use their own serializer (such as Spark). Using this method we write on the first four bytes the schema identifier first. IOException {Private void writeSchemaId(ByteArrayOutputStream, int I d) Try (os = new DataOutputStream(stream)) The Int(id) os.write; } } Then we can create a DatumWriter and set the object to serial. IOException {Private void writeSerializedAvro(ByteArrayOutputStream, T info, schema) Encoder BinaryEncoder = EncoderFactory.get().binaryEncoder(stream, zero); DatumWriter = new GenericDatumWriter<>(schema); DatumWriter.write(Encoder, Data); .flush) (encoder; } To bring it all together, we've implemented a generic serializer for data. Public class Serializer implements Kafka Apache Avro Serializer Personal schemaSchemaProvider; @Surride Public void configuration(Configure list, boolean isKey) = SchemaUtils.getSchemaProvider(configs); } @Surride Public byte] [serialize(Topic string, data T) Seek to (ByteArrayOutputStream = new ByteArrayOutputStream)) Scheme VersionedSchema = getSchema(data, subject); Id(stream, schema.getId)); WritingSerializedAvro(data source, schema.getSchema));
  • 3. Return.toByteArray); } (IOException e) RuntimeException('Cannot serialize data, 'e); } } IOException {...} Private void writeSchemaId(ByteArrayOutputStream stream, int I d) throws IOException {...} Private void writeSerializedAvro(ByteArrayOutputStream line, T data, Schema schema) throws Private getSchema(T info, string topic) {...} @Surride Public close) (void Check SchemaProcessor.close); } (Exception e) {catch RuntimeException(e) throw new; } } } Deserialization of Standard Data in Apache Avro Deserialization can work with a single schema (with which the schema data was written) but you can define a specific schema for readers. The reader scheme has to be consistent with the schema with which the data has been serialized, but need not be identical. We implemented scheme names for this purpose. We can now decide that we want to use a specific version of a schema to read data. We read desired schema versions per schema name at initialization time, and store metadata for quick access in readerSchemasByName. Now we can read any record written with a compatible schema version, as if it were written with the version specified. @Surride Public void configuration(Configure list, boolean isKey) This.schemaProvider = shemaUtils.getSchemaProvider(configs); = SchemaUtils.getVersionedSchemas(configs, schemaProvider); } When a record requires deserialization, we read the writer's scheme identifier first. This allows the reader schema to be looked up by its name. We can create a GeneralDatumReader with both schemes open, and read the record. @Surride Public GenericData. Record deserialize(Topic string, data byte]) Attempt to (ByteArrayInputStream = new ByteArrayInputStream(data)) In schemaId = read(stream); VersionedSchema = schemaProvider.get(schemaId); VersionedLeserSchema = ReaderName(writerSchema.getName)); GenericData. Record = readAvroRecord(stream, Schema.getSchema), (Schema.getSchema)) (reader; Rückkehr avroRecord; } (IOException e) RuntimeException(e) throw new; }
  • 4. } Private int readSchemaId(IOException) throws Try(DataInputStream is = DataInputStream new(stream)) The.readInt) (return is; } } About Specific Records in Apache Avro There is more often than not one class that we would like to use for our records. This class is generated from an Apache Avro scheme then usually. Apache Apache Avro offers tools forgenerating Java code from schemas. One such device is plugin Apache Avro Maven. The generated classes have the schema from which they were created at runtime. That simplifies and makes serialization and deserialization more successful. We can use the class to find out about the schema key to use for serialisation. @Surride Public void configuration(Configure list, boolean isKey) = configs.get(isKey? KEY RECORD CLASSNAME: VALUE RECORD CLASSNAME).toString); Try schemaProvider = SchemaUtils.getSchemaProvider(configs)) Class recordClass = Class.forName; SchemawriterSchema = new system RecordClass.getClassLoader()).getSchema(recordClass); = schemaProvider.getMetadata(writerSchema).getId); } (Exception e) {catch RuntimeException(e) throw new; } } And we don't need the reasoning to decide the subject and the data schema. For write records, we use the schema available inside the record class. @Surride Public T deserialize(Topic string, byte] [data) Attempt to (ByteArrayInputStream = new ByteArrayInputStream(data)) In schemaId = read(stream); VersionedSchema = schemaProvider.get(schemaId); ReadAvroRecord(stream, writeSchema.getSchema), (readerSchema) returns; } (IOException e) RuntimeException(e) throw new; } } IOException {Private T readAvroRecord(InputStream stream, Schema writerSchema, Schema readerSchema) DatumReader = new SpecificDatumReader<>(writerSchema, readerSchema); DecoderBinaryDecoder = DecoderFactory.get(.binaryDecoder(stream, null); Returns datumReader.read(null); } Likewise the reader scheme can be extracted from the class itself for deserialization. Deserialization logic is easier, because the reader schema is set at the time of initialization and need not be looked up by the name of the database.
  • 5. Conclusion I hope you reach to a conclusion about Apache Avro Deserialization. You can learn more through big data online training