SlideShare a Scribd company logo
Introduction of Hadoop
1
Institute of Manufacturing Information and Systems (製造資訊與系統研究所)
Institute of Engineering Management (工程管理碩士在職專班)
National Cheng Kung University (國立成功大學)
主題:Hadoop(HDFS, MapReduce)
指導教授:李家岩 博士
報 告 者:洪紹嚴
日期:2015/10/08
Productivity Optimization Lab Shao-Yen Hung
Origin of the name “Hadoop”?
2
This toy’s name is Hadoop
This guy is Doug Cutting.
MapReduce algorithm pops up(Google Labs)2004 =>
2006 => He created Hadoop framework (Yahoo!)
Productivity Optimization Lab Shao-Yen Hung
Architecture
3
(Data Store) (Data Processing)
Name Node
Secondary
Name Node
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Masters
Slaves
Productivity Optimization Lab Shao-Yen Hung
HDFS(Hadoop Distributed File System)
4
• In HDFS, the three casts are Client, Name Node, Data Nodes.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Write data
5
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXX
XXX
XXXXXXXXX
XXX
XXXXX
Original File
(140MB)
(Usually 64 or 128 MB)
Name Node
Client
64MB
64MB
12MB
XXXX
XXXX
XXXX
XXXX
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
A block
DN1 DN2 DN3 DN4 DN5
XXXX
XXXX
XXXX
XXXX
‧one block always has 3 replicas‧
(e.g.) Block 1
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
(1)
(2)
(3)
(4)
(4)
metadata
blocks
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(1/4)
6
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(2/4)
7
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
Blk 1 Blk 1
Blk 1
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(3/4)
8
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
Blk 2 Blk 2
Blk 2
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(4/4)
9
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
Blk 1 Blk 1
Blk 1
Blk 2
Blk 2
Blk 2
Blk 3
Blk 3
Blk 3
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung 10
HDFS—Read data
Name Node Client
Filename
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Please give me Block 2
XXXX
XXXX
XXXX
XXXX
Block 2
XXXX
XXXX
XXXX
XXXX
Block 3
(1)
(2)
(3)
Productivity Optimization Lab Shao-Yen Hung
HDFS—Name Node Failure(1/2)
• Name Node failure
11
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
 Single Point of Failure(單點故障,全部故障)
Productivity Optimization Lab Shao-Yen Hung
HDFS—Name Node Failure(2/2)
12
• Name Node failure
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Secondary
Name Node
 Connect to Name Node every hour.*(default)
 Backup of Name Node metadata.
 Rebuild Name Node if it fails.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Data Nodes Failure(1/2)
• Data Nodes failure
13
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
?
Productivity Optimization Lab Shao-Yen Hung
HDFS—Data Nodes Failure(2/2)
• Data Nodes failure
14
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Heartbeat
 Data Nodes send heartbeat to Name Node every 3 seconds
 A data node is regarded as “DEAD” if it doesn’t send a
heartbeat in 10 minutes.
 Name Node will replicate blocks to other DN when one data
node is dead.
Productivity Optimization Lab Shao-Yen Hung
Architecture
15
(Data Store) (Data Processing)
Name Node
Secondary
Name Node
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Masters
Slaves
Productivity Optimization Lab Shao-Yen Hung 16
MapReduce Algorithm
Name Node
Job Tracker
Client
(e.g.)How many times does
“POLab” occur in File?
Blk1: DN1, DN7, DN8
Blk2: DN2, DN5, DN6
Blk3: DN4, DN12, DN13
(1)
(2)
Task Tracker
DN2
Task Tracker
DN1
Task Tracker
DN3
Task Tracker
DN4
Blk 1 Blk 2 Blk 3
(3) Map
POLab = 3 POLab = 0 POLab = 11
(4) Reduce
POLab = 14
 A divide and conquer algorithm
Productivity Optimization Lab Shao-Yen Hung 17
Hadoop Ecosystem
http://www.inside.com.tw/2015/03/12/big-data-4-hadoop
Productivity Optimization Lab Shao-Yen Hung 18
Reference(學習地圖)
• 認識大數據的黃色小象幫手 –– Hadoop
• HDFS Explained as Comics
• Understanding Hadoop Clusters and the Network
• How to run Hadoop on Linux? (Practice)*

More Related Content

Viewers also liked

Soft shore WRITE UP
Soft shore WRITE UPSoft shore WRITE UP
Soft shore WRITE UP
Josh Cole
 
Oct.. 30, 2016
Oct.. 30, 2016Oct.. 30, 2016
Oct.. 30, 2016
triumphantlife
 
Next Generation TB Diagnostics
Next Generation TB DiagnosticsNext Generation TB Diagnostics
Next Generation TB Diagnostics
Lee Pyne-Mercier
 
Hans Boot tekent economische missie China
Hans Boot tekent economische missie ChinaHans Boot tekent economische missie China
Hans Boot tekent economische missie China
Flip Schultz
 
Cinema and gender, el cine y el género
Cinema and gender, el cine y el géneroCinema and gender, el cine y el género
Cinema and gender, el cine y el género
Patricia Álvarez Sánchez
 
eSafety in the future classroom report asquini
eSafety in the future classroom report asquinieSafety in the future classroom report asquini
eSafety in the future classroom report asquini
carla asquini
 
Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]
Nurdaulet Kupjasar
 
ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015
Taylan Y
 
I know myself better
I know myself betterI know myself better
I know myself better
carla asquini
 
Dec. 11, 2016 web
Dec. 11, 2016 webDec. 11, 2016 web
Dec. 11, 2016 web
triumphantlife
 
eTwinning LIVE Video tutorial
eTwinning LIVE Video tutorialeTwinning LIVE Video tutorial
eTwinning LIVE Video tutorial
carla asquini
 
Workplace Engagement (September 24, 2014)
Workplace Engagement (September 24, 2014)Workplace Engagement (September 24, 2014)
Workplace Engagement (September 24, 2014)
United Way of the National Capital Area
 
Software reuse
Software reuseSoftware reuse
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
PAGGMunicipal
 
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
PAGGMunicipal
 
El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal. El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal.
PAGGMunicipal
 
La historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempoLa historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempo
lFrijolito
 

Viewers also liked (18)

71 річниця
71 річниця71 річниця
71 річниця
 
Soft shore WRITE UP
Soft shore WRITE UPSoft shore WRITE UP
Soft shore WRITE UP
 
Oct.. 30, 2016
Oct.. 30, 2016Oct.. 30, 2016
Oct.. 30, 2016
 
Next Generation TB Diagnostics
Next Generation TB DiagnosticsNext Generation TB Diagnostics
Next Generation TB Diagnostics
 
Hans Boot tekent economische missie China
Hans Boot tekent economische missie ChinaHans Boot tekent economische missie China
Hans Boot tekent economische missie China
 
Cinema and gender, el cine y el género
Cinema and gender, el cine y el géneroCinema and gender, el cine y el género
Cinema and gender, el cine y el género
 
eSafety in the future classroom report asquini
eSafety in the future classroom report asquinieSafety in the future classroom report asquini
eSafety in the future classroom report asquini
 
Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]
 
ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015
 
I know myself better
I know myself betterI know myself better
I know myself better
 
Dec. 11, 2016 web
Dec. 11, 2016 webDec. 11, 2016 web
Dec. 11, 2016 web
 
eTwinning LIVE Video tutorial
eTwinning LIVE Video tutorialeTwinning LIVE Video tutorial
eTwinning LIVE Video tutorial
 
Workplace Engagement (September 24, 2014)
Workplace Engagement (September 24, 2014)Workplace Engagement (September 24, 2014)
Workplace Engagement (September 24, 2014)
 
Software reuse
Software reuseSoftware reuse
Software reuse
 
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
 
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
 
El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal. El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal.
 
La historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempoLa historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempo
 

Similar to Introduction of Hadoop

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
HDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - ChunkingHDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - Chunking
The HDF-EOS Tools and Information Center
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Siddharth Mathur
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
mundlapudi
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
Ayush .
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
Miraj Godha
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
Jim Dowling
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
Travis Oliphant
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
Thirunavukkarasu Ps
 
Hadoop
HadoopHadoop
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
Sandeep Deshmukh
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011
Andreas Jung
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
 
Hadoop
HadoopHadoop
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 

Similar to Introduction of Hadoop (20)

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
HDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - ChunkingHDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - Chunking
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop
HadoopHadoop
Hadoop
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 

More from Shao-Yen Hung

Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
Shao-Yen Hung
 
台灣漫畫史
台灣漫畫史台灣漫畫史
台灣漫畫史
Shao-Yen Hung
 
淺談秦始皇
淺談秦始皇淺談秦始皇
淺談秦始皇
Shao-Yen Hung
 
思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見
Shao-Yen Hung
 
思考技術(1)---勢
思考技術(1)---勢思考技術(1)---勢
思考技術(1)---勢
Shao-Yen Hung
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)
Shao-Yen Hung
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of Spark
Shao-Yen Hung
 

More from Shao-Yen Hung (7)

Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
 
台灣漫畫史
台灣漫畫史台灣漫畫史
台灣漫畫史
 
淺談秦始皇
淺談秦始皇淺談秦始皇
淺談秦始皇
 
思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見
 
思考技術(1)---勢
思考技術(1)---勢思考技術(1)---勢
思考技術(1)---勢
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of Spark
 

Recently uploaded

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 

Recently uploaded (20)

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 

Introduction of Hadoop

  • 1. Introduction of Hadoop 1 Institute of Manufacturing Information and Systems (製造資訊與系統研究所) Institute of Engineering Management (工程管理碩士在職專班) National Cheng Kung University (國立成功大學) 主題:Hadoop(HDFS, MapReduce) 指導教授:李家岩 博士 報 告 者:洪紹嚴 日期:2015/10/08
  • 2. Productivity Optimization Lab Shao-Yen Hung Origin of the name “Hadoop”? 2 This toy’s name is Hadoop This guy is Doug Cutting. MapReduce algorithm pops up(Google Labs)2004 => 2006 => He created Hadoop framework (Yahoo!)
  • 3. Productivity Optimization Lab Shao-Yen Hung Architecture 3 (Data Store) (Data Processing) Name Node Secondary Name Node Job Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Masters Slaves
  • 4. Productivity Optimization Lab Shao-Yen Hung HDFS(Hadoop Distributed File System) 4 • In HDFS, the three casts are Client, Name Node, Data Nodes.
  • 5. Productivity Optimization Lab Shao-Yen Hung HDFS—Write data 5 XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXX XXX XXXXXXXXX XXX XXXXX Original File (140MB) (Usually 64 or 128 MB) Name Node Client 64MB 64MB 12MB XXXX XXXX XXXX XXXX Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. A block DN1 DN2 DN3 DN4 DN5 XXXX XXXX XXXX XXXX ‧one block always has 3 replicas‧ (e.g.) Block 1 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX (1) (2) (3) (4) (4) metadata blocks
  • 6. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(1/4) 6 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 7. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(2/4) 7 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3 Blk 1 Blk 1 Blk 1  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 8. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(3/4) 8 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3 Blk 2 Blk 2 Blk 2  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 9. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(4/4) 9 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3 Blk 1 Blk 1 Blk 1 Blk 2 Blk 2 Blk 2 Blk 3 Blk 3 Blk 3  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 10. Productivity Optimization Lab Shao-Yen Hung 10 HDFS—Read data Name Node Client Filename Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 Please give me Block 2 XXXX XXXX XXXX XXXX Block 2 XXXX XXXX XXXX XXXX Block 3 (1) (2) (3)
  • 11. Productivity Optimization Lab Shao-Yen Hung HDFS—Name Node Failure(1/2) • Name Node failure 11 Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5  Single Point of Failure(單點故障,全部故障)
  • 12. Productivity Optimization Lab Shao-Yen Hung HDFS—Name Node Failure(2/2) 12 • Name Node failure Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 Secondary Name Node  Connect to Name Node every hour.*(default)  Backup of Name Node metadata.  Rebuild Name Node if it fails.
  • 13. Productivity Optimization Lab Shao-Yen Hung HDFS—Data Nodes Failure(1/2) • Data Nodes failure 13 Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 ?
  • 14. Productivity Optimization Lab Shao-Yen Hung HDFS—Data Nodes Failure(2/2) • Data Nodes failure 14 Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 Heartbeat  Data Nodes send heartbeat to Name Node every 3 seconds  A data node is regarded as “DEAD” if it doesn’t send a heartbeat in 10 minutes.  Name Node will replicate blocks to other DN when one data node is dead.
  • 15. Productivity Optimization Lab Shao-Yen Hung Architecture 15 (Data Store) (Data Processing) Name Node Secondary Name Node Job Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Masters Slaves
  • 16. Productivity Optimization Lab Shao-Yen Hung 16 MapReduce Algorithm Name Node Job Tracker Client (e.g.)How many times does “POLab” occur in File? Blk1: DN1, DN7, DN8 Blk2: DN2, DN5, DN6 Blk3: DN4, DN12, DN13 (1) (2) Task Tracker DN2 Task Tracker DN1 Task Tracker DN3 Task Tracker DN4 Blk 1 Blk 2 Blk 3 (3) Map POLab = 3 POLab = 0 POLab = 11 (4) Reduce POLab = 14  A divide and conquer algorithm
  • 17. Productivity Optimization Lab Shao-Yen Hung 17 Hadoop Ecosystem http://www.inside.com.tw/2015/03/12/big-data-4-hadoop
  • 18. Productivity Optimization Lab Shao-Yen Hung 18 Reference(學習地圖) • 認識大數據的黃色小象幫手 –– Hadoop • HDFS Explained as Comics • Understanding Hadoop Clusters and the Network • How to run Hadoop on Linux? (Practice)*