What is hadoop and how it works?

Cnu Federer
Image source : hadoop.apache.org

Cnu Federer ExplainToday.blogspot.com
What is hadoop?
●
A powerful frame work to process big data
●
Parallel processing and Distributed database
HADOOP
Big
Data
Analytics,
Recommendations,
Insights

Where is it come from?
●
Evolved from Google Map Reduce and
Google's Database file system
●
Later converted to open source project

Why it is siginificant?
●
Data is growing rapidly
●
Need for proper analytics
●
Saving power and time
●
Traditional methods failed

Key terms in hadoop
●
Name Node
– Important machine which stores metadata about
datanodes
●
Resource Manager (Job Tracker)
– Manages available resources (datanodes'
memory/processing power)
These two considered as masters

Key terms (contd..)
●
Data Node
– Which stores data and do map reduce tasks
– We can add as many as we want
●
Secondary Name node
– Takes frequent image files from Name node
– Useful in recovering Namenode failure
– Reduces burden for Name node

Key terms (contd..)
●
HDFS
– Hadoop Distributed File System
– Each machine has their loca file systems, but this is
distributed and available for all machines
●
History Server
– Saves Job history of data nodes

What is map-reduce?
●
A software framework used to process data
●
Introduced by Google
●
Map and Reduce are two phases
Mapping phase
Reducing Phase
Data Key-Value pairs
Results

How map-reduce works?
Image source : google
Example : Calculating no.of times a word occurs

Hadoop – Work flow
Name Node
Resource
Manager
Data
Node
Data
Node
Data
Node
Data
Node
History
Server
Secondary
Namenode
1
2
3
4
5
6

How hadoop works?
1 ➔
Store data in HDFS across all the nodes
➔
Namenode will store the metadata of
datanodes
➔
Task will be given to Hadoop cluster
➔
Resource Manager check with Name node
about which datanode has which data
2
3

How hadoop works? ( contd..)
4 ➔
Based on namenode inputs, RM will give
Map Reduce tasks to data nodes
➔
Data nodes performs Map Reduce and
store the task in History Server
➔
After tasks have completed, results will be
collected and given back to user
5

Commercial products
●
CDH ( Cloudera Distribution inclding
Apache Hadoop)
●
IBM Infosphere BigInsights
●
MapR apache hadoop distributions
●
Hortonworks Hadoop distributions
●
...... and so many

References
●
http://en.wikipedia.org/wiki/Apache_Hadoop
●
http://hadoop.apache.org/
●
http://www01.ibm.com/software/data/infosphere/hadoop/

Cnu Federer (tweet@cnufederer)
TRY AND LEARN

What is hadoop and how it works?

More Related Content

What's hot

Viewers also liked

Similar to What is hadoop and how it works?

Recently uploaded

What is hadoop and how it works?