Coefficient of Thermal Expansion and their Importance.pptx
Introduction of Hadoop
1. Introduction of Hadoop
1
Institute of Manufacturing Information and Systems (製造資訊與系統研究所)
Institute of Engineering Management (工程管理碩士在職專班)
National Cheng Kung University (國立成功大學)
主題:Hadoop(HDFS, MapReduce)
指導教授:李家岩 博士
報 告 者:洪紹嚴
日期:2015/10/08
2. Productivity Optimization Lab Shao-Yen Hung
Origin of the name “Hadoop”?
2
This toy’s name is Hadoop
This guy is Doug Cutting.
MapReduce algorithm pops up(Google Labs)2004 =>
2006 => He created Hadoop framework (Yahoo!)
3. Productivity Optimization Lab Shao-Yen Hung
Architecture
3
(Data Store) (Data Processing)
Name Node
Secondary
Name Node
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Masters
Slaves
4. Productivity Optimization Lab Shao-Yen Hung
HDFS(Hadoop Distributed File System)
4
• In HDFS, the three casts are Client, Name Node, Data Nodes.
11. Productivity Optimization Lab Shao-Yen Hung
HDFS—Name Node Failure(1/2)
• Name Node failure
11
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Single Point of Failure(單點故障,全部故障)
12. Productivity Optimization Lab Shao-Yen Hung
HDFS—Name Node Failure(2/2)
12
• Name Node failure
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Secondary
Name Node
Connect to Name Node every hour.*(default)
Backup of Name Node metadata.
Rebuild Name Node if it fails.
14. Productivity Optimization Lab Shao-Yen Hung
HDFS—Data Nodes Failure(2/2)
• Data Nodes failure
14
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Heartbeat
Data Nodes send heartbeat to Name Node every 3 seconds
A data node is regarded as “DEAD” if it doesn’t send a
heartbeat in 10 minutes.
Name Node will replicate blocks to other DN when one data
node is dead.
15. Productivity Optimization Lab Shao-Yen Hung
Architecture
15
(Data Store) (Data Processing)
Name Node
Secondary
Name Node
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Masters
Slaves
18. Productivity Optimization Lab Shao-Yen Hung 18
Reference(學習地圖)
• 認識大數據的黃色小象幫手 –– Hadoop
• HDFS Explained as Comics
• Understanding Hadoop Clusters and the Network
• How to run Hadoop on Linux? (Practice)*