Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women

Hadoop
Features,Key
Advantages,Versions
BY,
J.G.Rohini,
II-M.Sc.,Computer Science.

1. Tooling :
 Developers can create, design, and deploy big
data services on any platform or development
environment as per their choice.

2. Code generation :
 Hadoop big data suite, there is no need of
writing, debugging, analyzing, and optimizing
MapReduce code
 the complete code is auto generated.

3. Modeling :
 Every Hadoop distribution provides the
infrastructure to integrate Hadoop clusters.
 developers have to make complex codes to
develop MapReduce program.
 They can write such codes in simple Java, or
even can use optimized languages, such as
PigLatin, HQL,etc.

4. Scheduling :
 Big Data jobs execution needs to be monitored
and scheduled.
 Instead of writing jobs for scheduling, developers
can take help of big data suite to define and
handle the execution tasks in most efficient way.

5. Integration :
 Hadoop- it wants to integrate data from all types
of products and technologies.
 Along with files and SQL databases, developers
wants to integrate data from NoSQL databases,
social media, B2B products, etc.

Key Advantages
There are many advantages associated with Hadoop.
In this presentation we have came up with some
major advantages of Hadoop.

Scalable:
 Hadoop is highly scalable.
 it can store and distribute very large data sets
across hundreds of inexpensive servers.

Cost effective:
 Owing to its scale-out architecture
 Hadoop offers a cost effective storage solution
and processing

Flexible:
 Ability to work with all kind of data: structured,
semi-structured and unstructured.
 it can be used for a wide variety of purposes,
such as log processing, recommendation
systems,data warehousing ,data mining and more.

Fast:
 the process is extremely fast in compared to other
conventional systems owing to the ”move code to
data” paradigm.

Resilient to failure:
 Hadoop is fault tolerance.
 It practices replication of data diligently.
 ensuring that in the event of a node failure.

There are two version of Hadoop available:
1.Hadoop 1.0
2.Hadoop 2.0

Hadoop 1.0
It has two main parts:
1.Data storage framework
2.Data processing framework
1.Data storage framework:
 It is a general –purpose filesystem called
Hadoop Distributed File System.
 HDFS is schema-less.
 It stores data files can be in just about any
format.

2.Data processing framework:
 Is a simple functional programming model.
 It essentially uses two functions:
1.MAP
2.REDUCE
1.The “Mapers” take set of key-value pairs and
generate intermediate data.
2.The“Reducers” then act on this input to
produce the output data.

Hadoop 1.0
MapReduce
(Cluster Resource Manager
And Data Processing)
HDFS
(Redundant , reliable
storage)

Hadoop 2.0
 HDFS continues to be the data storage
framework.
 A new and separate resource management
framework called Yet Another Resource
Negotiator(YARN) has been added.
 Any application capable of dividing itself into
parallel tasks is supported by YARN.
 YARN coordinates the allocation of subtasks of the
submitted applications.

 Further enhancing the flexibility , scalability , and
efficiency of the applications.
 ApplicationMaster is able to run any application
and not just MapReduce.
 only supports batch processing but also real-time
processing.
 MapReduce is no longer the only data
processing option.

Hadoop 2.0
MapReduce
(Data Processing)
Others
(Data processing)
YARN
(cluster resource manager)
HDFS
(Redundant , reliable storage)

Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women

Similar to Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women (20)

Recently uploaded

Recently uploaded (20)

Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women