This document discusses Hadoop, including its key features, advantages, and versions. The main features of Hadoop are tooling, code generation, modeling, scheduling, and integration capabilities. It has advantages such as being scalable, cost effective, flexible, fast, and resilient to failure. There are two main versions of Hadoop: Hadoop 1.0 uses MapReduce for data processing and HDFS for storage, while Hadoop 2.0 introduces YARN as a separate resource manager and allows various data processing frameworks beyond just MapReduce.
3. 1. Tooling :
Developers can create, design, and deploy big
data services on any platform or development
environment as per their choice.
4. 2. Code generation :
Hadoop big data suite, there is no need of
writing, debugging, analyzing, and optimizing
MapReduce code
the complete code is auto generated.
5. 3. Modeling :
Every Hadoop distribution provides the
infrastructure to integrate Hadoop clusters.
developers have to make complex codes to
develop MapReduce program.
They can write such codes in simple Java, or
even can use optimized languages, such as
PigLatin, HQL,etc.
6. 4. Scheduling :
Big Data jobs execution needs to be monitored
and scheduled.
Instead of writing jobs for scheduling, developers
can take help of big data suite to define and
handle the execution tasks in most efficient way.
7. 5. Integration :
Hadoop- it wants to integrate data from all types
of products and technologies.
Along with files and SQL databases, developers
wants to integrate data from NoSQL databases,
social media, B2B products, etc.
8. Key Advantages
There are many advantages associated with Hadoop.
In this presentation we have came up with some
major advantages of Hadoop.
9. Scalable:
Hadoop is highly scalable.
it can store and distribute very large data sets
across hundreds of inexpensive servers.
10. Cost effective:
Owing to its scale-out architecture
Hadoop offers a cost effective storage solution
and processing
11. Flexible:
Ability to work with all kind of data: structured,
semi-structured and unstructured.
it can be used for a wide variety of purposes,
such as log processing, recommendation
systems,data warehousing ,data mining and more.
12. Fast:
the process is extremely fast in compared to other
conventional systems owing to the ”move code to
data” paradigm.
13. Resilient to failure:
Hadoop is fault tolerance.
It practices replication of data diligently.
ensuring that in the event of a node failure.
15. There are two version of Hadoop available:
1.Hadoop 1.0
2.Hadoop 2.0
16. Hadoop 1.0
It has two main parts:
1.Data storage framework
2.Data processing framework
1.Data storage framework:
It is a general –purpose filesystem called
Hadoop Distributed File System.
HDFS is schema-less.
It stores data files can be in just about any
format.
17. 2.Data processing framework:
Is a simple functional programming model.
It essentially uses two functions:
1.MAP
2.REDUCE
1.The “Mapers” take set of key-value pairs and
generate intermediate data.
2.The“Reducers” then act on this input to
produce the output data.
19. Hadoop 2.0
HDFS continues to be the data storage
framework.
A new and separate resource management
framework called Yet Another Resource
Negotiator(YARN) has been added.
Any application capable of dividing itself into
parallel tasks is supported by YARN.
YARN coordinates the allocation of subtasks of the
submitted applications.
20. Further enhancing the flexibility , scalability , and
efficiency of the applications.
ApplicationMaster is able to run any application
and not just MapReduce.
only supports batch processing but also real-time
processing.
MapReduce is no longer the only data
processing option.