SlideShare a Scribd company logo
1 of 8
ACADGILDACADGILD
INTRODUCTION
This blog helps those people who want to build their own custom types in Hadoop which is possible
only with Writable and WritableComparable.
After reading this blog you will get a clear understanding of:
•What are Writables?
•Importance of Writables in Hadoop
•Why are Writables introduced in Hadoop?
•What if Writables were not there in Hadoop?
•How can Writable and WritableComparable be implemented in Hadoop?
With this knowledge you can get going with Writables and WritableComparables in Hadoop.
Writables and its Importance in Hadoop
Writable is an interface in Hadoop. Writable in Hadoop acts as a wrapper class to almost all the
primitive data type of Java. That is how int of java has become IntWritable in Hadoop and String of
Java has become Text in Hadoop.
Writables are used for creating serialized data types in Hadoop. So, let us start by understanding what
data type, interface and serilization is.
Data Type
A data type is a set of data with values having predefined characteristics. There are several kinds of
data types in Java. For example- int, short, byte, long, char etc. These are called as primitive data
types. All these primitive data types are bound to classes called as wrapper class. For example int,
short, byte, long are grouped under INTEGER which is a wrapper class. These wrapper classes are
predefined in the Java.
Interface in Java
An interface in Java is a complete abstract class. The methods within an interface are abstract methods
which do not accept body and the fields within the interface are public, static and final, which means
that the fields cannot be modified.
The structure of an interface is most likely to be a class. We cannot create an object for an interface and
the only way to use the interface is to implement it in other class by using 'implements' keyword.
Serialization
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
Serialization is nothing but converting the raw data into a stream of bytes which can travel along
different networks and can reside in different systems. Serialization is not the only concern of Writable
interface; it also has to perform compare and sorting operation in Hadoop.
Why are Writables Introduced in Hadoop?
Now the question is whether Writables are necessary for Hadoop. Hadoop frame work definitely needs
Writable type of interface in order to perform the following tasks:
•Implement serialization
•Transfer data between clusters and networks
•Store the deserialized data in the local disk of the system
Implementation of writable is similar to implementation of interface in Java. It can be done by simply
writing the keyword 'implements' and overriding the default writable method.
Writable is a strong interface in Hadoop which while serializing the data, reduces the data size
enormously, so that data can be exchanged easily within the networks. It has separate read and write
fields to read data from network and write data into local disk respectively. Every data inside Hadoop
should accept writable and comparable interface properties.
We have seen how Writables reduces the data size overhead and make the data transfer easier in the
network.
What if Writable were not there in Hadoop?
Let us now understand what happens if Writable is not present in Hadoop.
Serialization is important in Hadoop because it enables easy transfer of data. If Writable is not present
in Hadoop, then it uses the serialization of Java which increases the data over-head in the network.
smallInt serialized value using Java serializer
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
aced0005737200116a6176612e6c616e672e496e74656765
7212e2a0a4f781873802000149000576616c7565787200106a6176612e
6c616e672e4e756d62657286ac951d0b94e08b020000787000000064
smallInt serialized value using IntWritable
00000064
This shows the clear difference between serialization in Java and Hadoop and also the difference
between ObjectInputStream and Writable interface. If the size of serialized data in Hadoop is like that
of Java, then it will definitely become an overhead in the network.
Also the core part of Hadoop framework i.e., shuffle and sort phase won’t be executed without using
Writable.
How can Writables be Implemneted in Hadoop?
Writable variables in Hadoop have the default properties of Comparable. For example:
When we write a key as IntWritable in the Mapper class and send it to the reducer class, there is an
intermediate phase between the Mapper and Reducer class i.e., shuffle and sort, where each key has to
be compared with many other keys. If the keys are not comparable, then shuffle and sort phase won’t
be executed or may be executed with high amount of overhead.
If a key is taken asIntWritable by default, then it has comparable feature because of RawComparator
acting on that variable. It will compare the key taken with the other keys in the network. This cannot
take place in the absence of Writable.
Can we make custom Writables? The answer is definitely 'yes’. We can make our own custom Writable
type.
Let us now see how to make a custom type in Java.
The steps to make a custom type in Java is as follows:
public class add {
int a;
int b;
public add() {
this.a = a;
this.b = b;
}
}
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
Similarly we can make a custom type in Hadoop using Writables.
For implementing Writables, we need few more methods in Hadoop:
public interface Writable {
void readFields(DataInput in);
void write(DataOutput out);
}
Here, readFields, reads the data from network and write will write the data into local disk. Both are
necessary for transferring data through clusters. DataInput and DataOutput classes (part of java.io)
contain methods to serialize the most basic types of data.
Suppose we want to make a composite key in Hadoop by combining two Writables then follow the
steps below:
public class add implements Writable{
public int a;
public int b;
public add(){
this.a=a;
this.b=b;
}
public void write(DataOutput out) throws IOException {
out.writeInt(a);
out.writeInt(b);
}
public void readFields(DataInput in) throws IOException {
a = in.readInt();
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
b = in.readInt();
}
public String toString() {
return Integer.toString(a) + ", " + Integer.toString(b)
}
}
Thus we can create our custom Writables in a way similar to custom types in Java but with two
additional methods, write and readFields. The custom writable can travel through networks and can
reside in other systems.
This custom type cannot be compared with each other by default, so again we need to make them
comparable with each other.
Let us now discuss what is WritableComparable and the solution to the above problem.
As explained above, if a key is taken as IntWritable, by default it has comparable feature because of
RawComparator acting on that variable and it will compare the key taken with the other keys in
network and If Writable is not there it won't be executed.
By default, IntWritable, LongWritable and Text have a RawComparator which can execute this
comparable phase for them. Then, will RawComparator help the custom Writable? The answer is no.
So, we need to have WritableComparable.
WritableComparable can be defined as a sub interface of Writable, which has the feature of
Comparable too. If we have created our custom type writable, then why do we need
WritableComparable?
We need to make our custom type, comparable if we want to compare this type with the other.
we want to make our custom type as a key, then we should definitely make our key type as
WritableComparable rather than simply Writable. This enables the custom type to be compared with
other types and it is also sorted accordingly. Otherwise, the keys won’t be compared with each other
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
and they are just passed through the network.
What happens if WritableComparable is not present?
If we have made our custom type Writable rather than WritableComparable our data won’t be
compared with other data types. There is no compulsion that our custom types need to be
WritableComparable until unless if it is a key. Because values don't need to be compared with each
other as keys.
If our custom type is a key then we should have WritableComparable or else the data won’t be sorted.
How can WritableComparable be implemented in Hadoop?
The implementation of WritableComparable is similar to Writable but with an additional ‘CompareTo’
method inside it.
public interface WritableComparable extends Writable, Comparable
{
void readFields(DataInput in);
void write(DataOutput out);
int compareTo(WritableComparable o)
}
How to make our custom type, WritableComparable?
We can make custom type a WritableComparable by following the method below:
public class add implements WritableComparable{
public int a;
public int b;
public add(){
this.a=a;
this.b=b;
}
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
public void write(DataOutput out) throws IOException {
out.writeint(a);
out.writeint(b);
}
public void readFields(DataInput in) throws IOException {
a = in.readint();
b = in.readint();
}
public int CompareTo(add c){
int presentValue=this.value;
int CompareValue=c.value;
return (presentValue < CompareValue ? -1 : (presentValue==CompareValue ? 0 : 1));
}
public int hashCode() {
return Integer.IntToIntBits(a)^ Integer.IntToIntBits(b);
}
}
These read fields and write make the comparison of data faster in the network.
With the use of these Writable and WritableComparables in Hadoop, we can make our serialized
custom type with less difficulty. This gives the ease for developers to make their custom types based on
their requirement.
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
ACADGILDACADGILD
https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop

More Related Content

What's hot

Spring Data - Intro (Odessa Java TechTalks)
Spring Data - Intro (Odessa Java TechTalks)Spring Data - Intro (Odessa Java TechTalks)
Spring Data - Intro (Odessa Java TechTalks)Igor Anishchenko
 
A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesNicholas Crouch
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorialMax De Marzi
 
Building Spring Data with MongoDB
Building Spring Data with MongoDBBuilding Spring Data with MongoDB
Building Spring Data with MongoDBMongoDB
 
Tabular Data on the Web
Tabular Data on the WebTabular Data on the Web
Tabular Data on the WebGregg Kellogg
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]Otávio Santana
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol
 
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPOpen Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPPieter De Leenheer
 
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]Otávio Santana
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from JavaNeo4j
 
Joker'15 Java straitjackets for MongoDB
Joker'15 Java straitjackets for MongoDBJoker'15 Java straitjackets for MongoDB
Joker'15 Java straitjackets for MongoDBAlexey Zinoviev
 
JSON-LD update DC 2017
JSON-LD update DC 2017JSON-LD update DC 2017
JSON-LD update DC 2017Gregg Kellogg
 
The Semantic Web And The News
The Semantic Web And The NewsThe Semantic Web And The News
The Semantic Web And The Newskaellis
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesCurtis Mosters
 
GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑Pokai Chang
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016Luigi Dell'Aquila
 

What's hot (20)

Spring Data - Intro (Odessa Java TechTalks)
Spring Data - Intro (Odessa Java TechTalks)Spring Data - Intro (Odessa Java TechTalks)
Spring Data - Intro (Odessa Java TechTalks)
 
A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph Databases
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorial
 
Building Spring Data with MongoDB
Building Spring Data with MongoDBBuilding Spring Data with MongoDB
Building Spring Data with MongoDB
 
Tabular Data on the Web
Tabular Data on the WebTabular Data on the Web
Tabular Data on the Web
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAPOpen Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
 
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from Java
 
Joker'15 Java straitjackets for MongoDB
Joker'15 Java straitjackets for MongoDBJoker'15 Java straitjackets for MongoDB
Joker'15 Java straitjackets for MongoDB
 
JSON-LD update DC 2017
JSON-LD update DC 2017JSON-LD update DC 2017
JSON-LD update DC 2017
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
The Semantic Web And The News
The Semantic Web And The NewsThe Semantic Web And The News
The Semantic Web And The News
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑GraphQL & Relay - 串起前後端世界的橋樑
GraphQL & Relay - 串起前後端世界的橋樑
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
 

Similar to ACADGILD:: HADOOP LESSON

Viridians on Rails
Viridians on RailsViridians on Rails
Viridians on RailsViridians
 
Hadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdHadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdKevin Weil
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @IndixManoj Mahalingam
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtssiddharth30121
 
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...Codemotion
 
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...Codemotion
 
GraphQL-ify your APIs - Devoxx UK 2021
 GraphQL-ify your APIs - Devoxx UK 2021 GraphQL-ify your APIs - Devoxx UK 2021
GraphQL-ify your APIs - Devoxx UK 2021Soham Dasgupta
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
 
Enterprise build tool gradle
Enterprise build tool gradleEnterprise build tool gradle
Enterprise build tool gradleDeepak Shevani
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Build the API you want to see in the world
Build the API you want to see in the worldBuild the API you want to see in the world
Build the API you want to see in the worldMichelle Garrett
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxKnoldus Inc.
 

Similar to ACADGILD:: HADOOP LESSON (20)

Viridians on Rails
Viridians on RailsViridians on Rails
Viridians on Rails
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Hadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdHadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant bird
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
 
Unit 3
Unit 3Unit 3
Unit 3
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
 
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
Michelle Garrett - Build the API you want to see in the world (with GraphQL) ...
 
GraphQL-ify your APIs - Devoxx UK 2021
 GraphQL-ify your APIs - Devoxx UK 2021 GraphQL-ify your APIs - Devoxx UK 2021
GraphQL-ify your APIs - Devoxx UK 2021
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Big data clustering
Big data clusteringBig data clustering
Big data clustering
 
Enterprise build tool gradle
Enterprise build tool gradleEnterprise build tool gradle
Enterprise build tool gradle
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Apache Pig
Apache PigApache Pig
Apache Pig
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Build the API you want to see in the world
Build the API you want to see in the worldBuild the API you want to see in the world
Build the API you want to see in the world
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptx
 

More from Padma shree. T

ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on railsACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on railsPadma shree. T
 
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...Padma shree. T
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hivePadma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON Padma shree. T
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON Padma shree. T
 

More from Padma shree. T (11)

ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on railsACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
 
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hive
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON
 

Recently uploaded

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 

Recently uploaded (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

ACADGILD:: HADOOP LESSON

  • 1. ACADGILDACADGILD INTRODUCTION This blog helps those people who want to build their own custom types in Hadoop which is possible only with Writable and WritableComparable. After reading this blog you will get a clear understanding of: •What are Writables? •Importance of Writables in Hadoop •Why are Writables introduced in Hadoop? •What if Writables were not there in Hadoop? •How can Writable and WritableComparable be implemented in Hadoop? With this knowledge you can get going with Writables and WritableComparables in Hadoop. Writables and its Importance in Hadoop Writable is an interface in Hadoop. Writable in Hadoop acts as a wrapper class to almost all the primitive data type of Java. That is how int of java has become IntWritable in Hadoop and String of Java has become Text in Hadoop. Writables are used for creating serialized data types in Hadoop. So, let us start by understanding what data type, interface and serilization is. Data Type A data type is a set of data with values having predefined characteristics. There are several kinds of data types in Java. For example- int, short, byte, long, char etc. These are called as primitive data types. All these primitive data types are bound to classes called as wrapper class. For example int, short, byte, long are grouped under INTEGER which is a wrapper class. These wrapper classes are predefined in the Java. Interface in Java An interface in Java is a complete abstract class. The methods within an interface are abstract methods which do not accept body and the fields within the interface are public, static and final, which means that the fields cannot be modified. The structure of an interface is most likely to be a class. We cannot create an object for an interface and the only way to use the interface is to implement it in other class by using 'implements' keyword. Serialization https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
  • 2. ACADGILDACADGILD Serialization is nothing but converting the raw data into a stream of bytes which can travel along different networks and can reside in different systems. Serialization is not the only concern of Writable interface; it also has to perform compare and sorting operation in Hadoop. Why are Writables Introduced in Hadoop? Now the question is whether Writables are necessary for Hadoop. Hadoop frame work definitely needs Writable type of interface in order to perform the following tasks: •Implement serialization •Transfer data between clusters and networks •Store the deserialized data in the local disk of the system Implementation of writable is similar to implementation of interface in Java. It can be done by simply writing the keyword 'implements' and overriding the default writable method. Writable is a strong interface in Hadoop which while serializing the data, reduces the data size enormously, so that data can be exchanged easily within the networks. It has separate read and write fields to read data from network and write data into local disk respectively. Every data inside Hadoop should accept writable and comparable interface properties. We have seen how Writables reduces the data size overhead and make the data transfer easier in the network. What if Writable were not there in Hadoop? Let us now understand what happens if Writable is not present in Hadoop. Serialization is important in Hadoop because it enables easy transfer of data. If Writable is not present in Hadoop, then it uses the serialization of Java which increases the data over-head in the network. smallInt serialized value using Java serializer https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
  • 3. ACADGILDACADGILD aced0005737200116a6176612e6c616e672e496e74656765 7212e2a0a4f781873802000149000576616c7565787200106a6176612e 6c616e672e4e756d62657286ac951d0b94e08b020000787000000064 smallInt serialized value using IntWritable 00000064 This shows the clear difference between serialization in Java and Hadoop and also the difference between ObjectInputStream and Writable interface. If the size of serialized data in Hadoop is like that of Java, then it will definitely become an overhead in the network. Also the core part of Hadoop framework i.e., shuffle and sort phase won’t be executed without using Writable. How can Writables be Implemneted in Hadoop? Writable variables in Hadoop have the default properties of Comparable. For example: When we write a key as IntWritable in the Mapper class and send it to the reducer class, there is an intermediate phase between the Mapper and Reducer class i.e., shuffle and sort, where each key has to be compared with many other keys. If the keys are not comparable, then shuffle and sort phase won’t be executed or may be executed with high amount of overhead. If a key is taken asIntWritable by default, then it has comparable feature because of RawComparator acting on that variable. It will compare the key taken with the other keys in the network. This cannot take place in the absence of Writable. Can we make custom Writables? The answer is definitely 'yes’. We can make our own custom Writable type. Let us now see how to make a custom type in Java. The steps to make a custom type in Java is as follows: public class add { int a; int b; public add() { this.a = a; this.b = b; } } https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
  • 4. ACADGILDACADGILD Similarly we can make a custom type in Hadoop using Writables. For implementing Writables, we need few more methods in Hadoop: public interface Writable { void readFields(DataInput in); void write(DataOutput out); } Here, readFields, reads the data from network and write will write the data into local disk. Both are necessary for transferring data through clusters. DataInput and DataOutput classes (part of java.io) contain methods to serialize the most basic types of data. Suppose we want to make a composite key in Hadoop by combining two Writables then follow the steps below: public class add implements Writable{ public int a; public int b; public add(){ this.a=a; this.b=b; } public void write(DataOutput out) throws IOException { out.writeInt(a); out.writeInt(b); } public void readFields(DataInput in) throws IOException { a = in.readInt(); https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
  • 5. ACADGILDACADGILD b = in.readInt(); } public String toString() { return Integer.toString(a) + ", " + Integer.toString(b) } } Thus we can create our custom Writables in a way similar to custom types in Java but with two additional methods, write and readFields. The custom writable can travel through networks and can reside in other systems. This custom type cannot be compared with each other by default, so again we need to make them comparable with each other. Let us now discuss what is WritableComparable and the solution to the above problem. As explained above, if a key is taken as IntWritable, by default it has comparable feature because of RawComparator acting on that variable and it will compare the key taken with the other keys in network and If Writable is not there it won't be executed. By default, IntWritable, LongWritable and Text have a RawComparator which can execute this comparable phase for them. Then, will RawComparator help the custom Writable? The answer is no. So, we need to have WritableComparable. WritableComparable can be defined as a sub interface of Writable, which has the feature of Comparable too. If we have created our custom type writable, then why do we need WritableComparable? We need to make our custom type, comparable if we want to compare this type with the other. we want to make our custom type as a key, then we should definitely make our key type as WritableComparable rather than simply Writable. This enables the custom type to be compared with other types and it is also sorted accordingly. Otherwise, the keys won’t be compared with each other https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
  • 6. ACADGILDACADGILD and they are just passed through the network. What happens if WritableComparable is not present? If we have made our custom type Writable rather than WritableComparable our data won’t be compared with other data types. There is no compulsion that our custom types need to be WritableComparable until unless if it is a key. Because values don't need to be compared with each other as keys. If our custom type is a key then we should have WritableComparable or else the data won’t be sorted. How can WritableComparable be implemented in Hadoop? The implementation of WritableComparable is similar to Writable but with an additional ‘CompareTo’ method inside it. public interface WritableComparable extends Writable, Comparable { void readFields(DataInput in); void write(DataOutput out); int compareTo(WritableComparable o) } How to make our custom type, WritableComparable? We can make custom type a WritableComparable by following the method below: public class add implements WritableComparable{ public int a; public int b; public add(){ this.a=a; this.b=b; } https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop
  • 7. ACADGILDACADGILD public void write(DataOutput out) throws IOException { out.writeint(a); out.writeint(b); } public void readFields(DataInput in) throws IOException { a = in.readint(); b = in.readint(); } public int CompareTo(add c){ int presentValue=this.value; int CompareValue=c.value; return (presentValue < CompareValue ? -1 : (presentValue==CompareValue ? 0 : 1)); } public int hashCode() { return Integer.IntToIntBits(a)^ Integer.IntToIntBits(b); } } These read fields and write make the comparison of data faster in the network. With the use of these Writable and WritableComparables in Hadoop, we can make our serialized custom type with less difficulty. This gives the ease for developers to make their custom types based on their requirement. https://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoophttps://acadgild.com/blog/s=Writable+and+WritableComparable+in+Hadoop