2. Contents
oWhat is Hadoop Technology ?
oDeveloper of Hadoop
oHadoop Features
oTwo main features of Hadoop
oGoals/Requirement
oHadoop Framework and Tools
oPros of Hadoop
oCons of Hadoop
3. What is Hadoop Technology ?
The most well known technology
used for Big Data is Hadoop.
Open source software framework
designed for storage and
processing of large scale data on
clusters of commodity hardware.
The Apache Hadoop software
library is a framework that
allows for the distributed
processing of large data sets
across clusters of computers
using simple programming
models.
It is a flexible and highly-
available architecture for
large scale computation and
data processing on a network
of commodity hardware.
It is made by apache software
foundation in 2011.
Written in JAVA.
4. Developer of Hadoop
Michael j. cafarella Doug cutting
Doug Cutting
and Michael J.
Cafarella
developed Hadoop
to support
distribution for
the Nutch search
engine project.
The project was
funded by Yahoo
5. Features of Hadoop
Hadoop provides access to the file systems
The Hadoop Common package contains the
necessary JAR files and scripts
The package also provides source code, documentation and a
contribution section that includes projects from the Hadoop
Community.
6. Problems Before Hadoop
1. Processing that large data is very difficult in relational
database.
2. It would take too much time to process data and cost.
7. We can solve this problem by Distributed
Computing.
• But the problems in distributed computing is –
Hardware failure
Chances of hardware failure is always there.
Combine the data after analysis
Data from all disks have to be combined from all the disks
which is a mess.
8. To Solve all the Problems Hadoop Came.
It has two main parts –
• Hadoop Distributed File System (HDFS),
• MapReduce
9. Two main features of Hadoop
1.Hadoop Distributed File
System
• It ties so many small and reasonable
priced machines together into a single
cost effective computer cluster.
• Data and application processing are
protected against hardware failure.
• If a node goes down, jobs are
automatically redirected to other
nodes to make sure the distributed
computing does not fail.
• it automatically stores multiple copies
of all data.
2. MapReduce
• MapReduce is a programming model for
processing and generating large data sets with
a parallel, distributed algorithm on a cluster.
• It is an associative implementation for
processing and generating large data sets.
• MAP function that process a key pair to
generates a set of intermediate key pairs.
• REDUCE function that merges all intermediate
values associated with the same intermediate
key.
10. Goals / requirement
Abstract and facilitate the storage and processing of large and/or rapidly growing data
sets
• Structured and non-structured data
• Simple programming models
High scalability and availability
Use commodity (cheap!) hardware with little redundancy
Fault-tolerance
Move computation rather than data
12. Pros of Hadoop
1. Computing power
2. Flexibility
3. Fault Tolerance
4. Low Cost
5. Scalability
13. Cons of Hadoop
Integration with existing systems
Hadoop is not optimized for ease for use. Installing and
integrating with existing databases might prove to be
difficult, especially since there is no software support
provided.
Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data
practitioners use SQL. This means significant training may
be required to administer Hadoop clusters.
Security
Hadoop lacks the level of security functionality needed for
safe enterprise deployment, especially if it concerns
sensitive data.
14. Benefits of Hadoop
• Cost Saving and efficient and reliable data processing
• Provides an economically scalable solution
• Storing and processing of large amount of data
• Data grid operating system
• It is deployed on industry standard servers rather than expensive
specialized data storage systems
Notes to presenter:
Description of what you learned in your own words on one side.
Include information about the topic
Details about the topic will also be helpful here.
Tell the story of your learning experience. Just like a story there should always be a beginning, middle and an end.
On the other side, you can add a graphic that provides evidence of what you learned.
Feel free to use more than one slide to reflect upon your process. It also helps to add some video of your process.