Cnu Federer
Image source : hadoop.apache.org
Cnu Federer ExplainToday.blogspot.com
What is hadoop?
●
A powerful frame work to process big data
●
Parallel processing and Distributed database
HADOOP
Big
Data
Analytics,
Recommendations,
Insights
Cnu Federer ExplainToday.blogspot.com
Where is it come from?
●
Evolved from Google Map Reduce and 
Google's Database file system
●
Later converted to open source project
Cnu Federer ExplainToday.blogspot.com
Why it is siginificant?
●
Data is growing rapidly
●
Need for proper analytics
●
Saving power and time 
●
Traditional methods failed
Cnu Federer ExplainToday.blogspot.com
Key terms in hadoop
●
Name Node
– Important machine which stores metadata about 
datanodes
●
Resource Manager (Job Tracker)
– Manages available resources (datanodes' 
memory/processing power)
These two considered as masters
Cnu Federer ExplainToday.blogspot.com
Key terms (contd..)
●
Data Node
– Which stores data and do map reduce tasks
– We can add as many as we want
●
Secondary Name node
– Takes frequent image files from Name node
– Useful in recovering Namenode failure
– Reduces burden for Name node
Cnu Federer ExplainToday.blogspot.com
Key terms (contd..)
●
HDFS
– Hadoop Distributed File System
– Each machine has their loca file systems, but this is 
distributed and available for all machines
●
History Server
– Saves Job history of data nodes
Cnu Federer ExplainToday.blogspot.com
What is map-reduce?
●
A software framework used to process data
●
Introduced by Google
●
Map and Reduce are two phases
Mapping phase
Reducing Phase
Data Key-Value pairs
Results
Cnu Federer ExplainToday.blogspot.com
How map-reduce works?
Image source : google
Example : Calculating no.of times a word occurs
Cnu Federer ExplainToday.blogspot.com
Hadoop – Work flow
Name Node
Resource
Manager
Data
Node
Data
Node
Data
Node
Data
Node
History
Server
Secondary
Namenode
1
2
3
4
5
6
Cnu Federer ExplainToday.blogspot.com
How hadoop works?
1 ➔
Store data in HDFS across all the nodes
➔
Namenode will store the metadata of 
datanodes
➔
Task will be given to Hadoop cluster
➔
Resource Manager check with Name node 
about which datanode has which data
2
3
Cnu Federer ExplainToday.blogspot.com
How hadoop works? ( contd..)
4 ➔
Based on namenode inputs, RM will give 
Map Reduce tasks to data nodes
➔
Data nodes performs Map Reduce and 
store the task in History Server
➔
After tasks have completed, results will be 
collected and given back to user
5
Cnu Federer ExplainToday.blogspot.com
Commercial products
●
CDH ( Cloudera Distribution inclding 
Apache Hadoop)
●
IBM Infosphere BigInsights
●
MapR apache hadoop distributions
●
Hortonworks Hadoop distributions
●
...... and so many
Cnu Federer ExplainToday.blogspot.com
References
●
http://en.wikipedia.org/wiki/Apache_Hadoop
●
http://hadoop.apache.org/
●
http://www­01.ibm.com/software/data/infosphere/hadoop/
Cnu Federer (tweet@cnufederer)
TRY AND LEARN

What is hadoop and how it works?