SlideShare a Scribd company logo
1 of 18
Introduction of Hadoop
1
Institute of Manufacturing Information and Systems (製造資訊與系統研究所)
Institute of Engineering Management (工程管理碩士在職專班)
National Cheng Kung University (國立成功大學)
主題:Hadoop(HDFS, MapReduce)
指導教授:李家岩 博士
報 告 者:洪紹嚴
日期:2015/10/08
Productivity Optimization Lab Shao-Yen Hung
Origin of the name “Hadoop”?
2
This toy’s name is Hadoop
This guy is Doug Cutting.
MapReduce algorithm pops up(Google Labs)2004 =>
2006 => He created Hadoop framework (Yahoo!)
Productivity Optimization Lab Shao-Yen Hung
Architecture
3
(Data Store) (Data Processing)
Name Node
Secondary
Name Node
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Masters
Slaves
Productivity Optimization Lab Shao-Yen Hung
HDFS(Hadoop Distributed File System)
4
• In HDFS, the three casts are Client, Name Node, Data Nodes.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Write data
5
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXX
XXX
XXXXXXXXX
XXX
XXXXX
Original File
(140MB)
(Usually 64 or 128 MB)
Name Node
Client
64MB
64MB
12MB
XXXX
XXXX
XXXX
XXXX
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
A block
DN1 DN2 DN3 DN4 DN5
XXXX
XXXX
XXXX
XXXX
‧one block always has 3 replicas‧
(e.g.) Block 1
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
(1)
(2)
(3)
(4)
(4)
metadata
blocks
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(1/4)
6
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(2/4)
7
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
Blk 1 Blk 1
Blk 1
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(3/4)
8
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
Blk 2 Blk 2
Blk 2
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Replica Strategy(4/4)
9
Name Node
Block1: DN1, DN4, DN5
Block2: DN4, DN7, DN8
Block3: DN9, DN1, DN2
DN1
DN2
DN3
DN5
DN6
DN4 DN7
DN8
DN9
Rack 1 Rack 2 Rack 3
Blk 1 Blk 1
Blk 1
Blk 2
Blk 2
Blk 2
Blk 3
Blk 3
Blk 3
 In-rack latency < cross-rack latency
 In-rack bandwidth > cross-rack bandwidth
(1)Put 1st replica in a random location.
(2)Put the next 2 replicas in a different rack.
Productivity Optimization Lab Shao-Yen Hung 10
HDFS—Read data
Name Node Client
Filename
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Please give me Block 2
XXXX
XXXX
XXXX
XXXX
Block 2
XXXX
XXXX
XXXX
XXXX
Block 3
(1)
(2)
(3)
Productivity Optimization Lab Shao-Yen Hung
HDFS—Name Node Failure(1/2)
• Name Node failure
11
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
 Single Point of Failure(單點故障,全部故障)
Productivity Optimization Lab Shao-Yen Hung
HDFS—Name Node Failure(2/2)
12
• Name Node failure
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Secondary
Name Node
 Connect to Name Node every hour.*(default)
 Backup of Name Node metadata.
 Rebuild Name Node if it fails.
Productivity Optimization Lab Shao-Yen Hung
HDFS—Data Nodes Failure(1/2)
• Data Nodes failure
13
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
?
Productivity Optimization Lab Shao-Yen Hung
HDFS—Data Nodes Failure(2/2)
• Data Nodes failure
14
Name Node
Block1: DN1, DN3, DN5
Block2: DN1, DN2, DN3
Block3: DN1, DN4, DN5
…and so on….
DN1
DN2
DN3
DN4
DN5
Heartbeat
 Data Nodes send heartbeat to Name Node every 3 seconds
 A data node is regarded as “DEAD” if it doesn’t send a
heartbeat in 10 minutes.
 Name Node will replicate blocks to other DN when one data
node is dead.
Productivity Optimization Lab Shao-Yen Hung
Architecture
15
(Data Store) (Data Processing)
Name Node
Secondary
Name Node
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Masters
Slaves
Productivity Optimization Lab Shao-Yen Hung 16
MapReduce Algorithm
Name Node
Job Tracker
Client
(e.g.)How many times does
“POLab” occur in File?
Blk1: DN1, DN7, DN8
Blk2: DN2, DN5, DN6
Blk3: DN4, DN12, DN13
(1)
(2)
Task Tracker
DN2
Task Tracker
DN1
Task Tracker
DN3
Task Tracker
DN4
Blk 1 Blk 2 Blk 3
(3) Map
POLab = 3 POLab = 0 POLab = 11
(4) Reduce
POLab = 14
 A divide and conquer algorithm
Productivity Optimization Lab Shao-Yen Hung 17
Hadoop Ecosystem
http://www.inside.com.tw/2015/03/12/big-data-4-hadoop
Productivity Optimization Lab Shao-Yen Hung 18
Reference(學習地圖)
• 認識大數據的黃色小象幫手 –– Hadoop
• HDFS Explained as Comics
• Understanding Hadoop Clusters and the Network
• How to run Hadoop on Linux? (Practice)*

More Related Content

Viewers also liked

Soft shore WRITE UP
Soft shore WRITE UPSoft shore WRITE UP
Soft shore WRITE UPJosh Cole
 
Next Generation TB Diagnostics
Next Generation TB DiagnosticsNext Generation TB Diagnostics
Next Generation TB DiagnosticsLee Pyne-Mercier
 
Hans Boot tekent economische missie China
Hans Boot tekent economische missie ChinaHans Boot tekent economische missie China
Hans Boot tekent economische missie ChinaFlip Schultz
 
eSafety in the future classroom report asquini
eSafety in the future classroom report asquinieSafety in the future classroom report asquini
eSafety in the future classroom report asquinicarla asquini
 
Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]Nurdaulet Kupjasar
 
ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015Taylan Y
 
I know myself better
I know myself betterI know myself better
I know myself bettercarla asquini
 
eTwinning LIVE Video tutorial
eTwinning LIVE Video tutorialeTwinning LIVE Video tutorial
eTwinning LIVE Video tutorialcarla asquini
 
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...PAGGMunicipal
 
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el MunicipioPAGGMunicipal
 
El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal. El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal. PAGGMunicipal
 
La historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempoLa historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempolFrijolito
 

Viewers also liked (18)

71 річниця
71 річниця71 річниця
71 річниця
 
Soft shore WRITE UP
Soft shore WRITE UPSoft shore WRITE UP
Soft shore WRITE UP
 
Oct.. 30, 2016
Oct.. 30, 2016Oct.. 30, 2016
Oct.. 30, 2016
 
Next Generation TB Diagnostics
Next Generation TB DiagnosticsNext Generation TB Diagnostics
Next Generation TB Diagnostics
 
Hans Boot tekent economische missie China
Hans Boot tekent economische missie ChinaHans Boot tekent economische missie China
Hans Boot tekent economische missie China
 
Cinema and gender, el cine y el género
Cinema and gender, el cine y el géneroCinema and gender, el cine y el género
Cinema and gender, el cine y el género
 
eSafety in the future classroom report asquini
eSafety in the future classroom report asquinieSafety in the future classroom report asquini
eSafety in the future classroom report asquini
 
Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]Lecture 9 [compatibility mode]
Lecture 9 [compatibility mode]
 
ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015ContentFry-EN-Presentation_nisan2015
ContentFry-EN-Presentation_nisan2015
 
I know myself better
I know myself betterI know myself better
I know myself better
 
Dec. 11, 2016 web
Dec. 11, 2016 webDec. 11, 2016 web
Dec. 11, 2016 web
 
eTwinning LIVE Video tutorial
eTwinning LIVE Video tutorialeTwinning LIVE Video tutorial
eTwinning LIVE Video tutorial
 
Workplace Engagement (September 24, 2014)
Workplace Engagement (September 24, 2014)Workplace Engagement (September 24, 2014)
Workplace Engagement (September 24, 2014)
 
Software reuse
Software reuseSoftware reuse
Software reuse
 
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
Mi Propiedad. La titularidad de la tierra en el municipio Sucre del estado Mi...
 
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
1er Encuentro Nacional de Política Social y Programas Sociales en el Municipio
 
El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal. El impacto de los programas sociales de Carrizal.
El impacto de los programas sociales de Carrizal.
 
La historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempoLa historia de la danza estudia la evolución de la danza a través del tiempo
La historia de la danza estudia la evolución de la danza a través del tiempo
 

Similar to Introduction of Hadoop

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptxAyush .
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxMiraj Godha
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Andreas Jung
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 

Similar to Introduction of Hadoop (20)

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
HDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - ChunkingHDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - Chunking
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Python mongo db-training-europython-2011
Python mongo db-training-europython-2011Python mongo db-training-europython-2011
Python mongo db-training-europython-2011
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop
HadoopHadoop
Hadoop
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 

More from Shao-Yen Hung

Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsShao-Yen Hung
 
思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見Shao-Yen Hung
 
思考技術(1)---勢
思考技術(1)---勢思考技術(1)---勢
思考技術(1)---勢Shao-Yen Hung
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Shao-Yen Hung
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of SparkShao-Yen Hung
 

More from Shao-Yen Hung (7)

Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
 
台灣漫畫史
台灣漫畫史台灣漫畫史
台灣漫畫史
 
淺談秦始皇
淺談秦始皇淺談秦始皇
淺談秦始皇
 
思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見
 
思考技術(1)---勢
思考技術(1)---勢思考技術(1)---勢
思考技術(1)---勢
 
Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)Online Optimization Problem-1 (Online machine learning)
Online Optimization Problem-1 (Online machine learning)
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of Spark
 

Recently uploaded

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 

Recently uploaded (20)

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 

Introduction of Hadoop

  • 1. Introduction of Hadoop 1 Institute of Manufacturing Information and Systems (製造資訊與系統研究所) Institute of Engineering Management (工程管理碩士在職專班) National Cheng Kung University (國立成功大學) 主題:Hadoop(HDFS, MapReduce) 指導教授:李家岩 博士 報 告 者:洪紹嚴 日期:2015/10/08
  • 2. Productivity Optimization Lab Shao-Yen Hung Origin of the name “Hadoop”? 2 This toy’s name is Hadoop This guy is Doug Cutting. MapReduce algorithm pops up(Google Labs)2004 => 2006 => He created Hadoop framework (Yahoo!)
  • 3. Productivity Optimization Lab Shao-Yen Hung Architecture 3 (Data Store) (Data Processing) Name Node Secondary Name Node Job Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Masters Slaves
  • 4. Productivity Optimization Lab Shao-Yen Hung HDFS(Hadoop Distributed File System) 4 • In HDFS, the three casts are Client, Name Node, Data Nodes.
  • 5. Productivity Optimization Lab Shao-Yen Hung HDFS—Write data 5 XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXX XXX XXXXXXXXX XXX XXXXX Original File (140MB) (Usually 64 or 128 MB) Name Node Client 64MB 64MB 12MB XXXX XXXX XXXX XXXX Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. A block DN1 DN2 DN3 DN4 DN5 XXXX XXXX XXXX XXXX ‧one block always has 3 replicas‧ (e.g.) Block 1 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX (1) (2) (3) (4) (4) metadata blocks
  • 6. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(1/4) 6 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 7. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(2/4) 7 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3 Blk 1 Blk 1 Blk 1  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 8. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(3/4) 8 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3 Blk 2 Blk 2 Blk 2  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 9. Productivity Optimization Lab Shao-Yen Hung HDFS—Replica Strategy(4/4) 9 Name Node Block1: DN1, DN4, DN5 Block2: DN4, DN7, DN8 Block3: DN9, DN1, DN2 DN1 DN2 DN3 DN5 DN6 DN4 DN7 DN8 DN9 Rack 1 Rack 2 Rack 3 Blk 1 Blk 1 Blk 1 Blk 2 Blk 2 Blk 2 Blk 3 Blk 3 Blk 3  In-rack latency < cross-rack latency  In-rack bandwidth > cross-rack bandwidth (1)Put 1st replica in a random location. (2)Put the next 2 replicas in a different rack.
  • 10. Productivity Optimization Lab Shao-Yen Hung 10 HDFS—Read data Name Node Client Filename Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 Please give me Block 2 XXXX XXXX XXXX XXXX Block 2 XXXX XXXX XXXX XXXX Block 3 (1) (2) (3)
  • 11. Productivity Optimization Lab Shao-Yen Hung HDFS—Name Node Failure(1/2) • Name Node failure 11 Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5  Single Point of Failure(單點故障,全部故障)
  • 12. Productivity Optimization Lab Shao-Yen Hung HDFS—Name Node Failure(2/2) 12 • Name Node failure Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 Secondary Name Node  Connect to Name Node every hour.*(default)  Backup of Name Node metadata.  Rebuild Name Node if it fails.
  • 13. Productivity Optimization Lab Shao-Yen Hung HDFS—Data Nodes Failure(1/2) • Data Nodes failure 13 Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 ?
  • 14. Productivity Optimization Lab Shao-Yen Hung HDFS—Data Nodes Failure(2/2) • Data Nodes failure 14 Name Node Block1: DN1, DN3, DN5 Block2: DN1, DN2, DN3 Block3: DN1, DN4, DN5 …and so on…. DN1 DN2 DN3 DN4 DN5 Heartbeat  Data Nodes send heartbeat to Name Node every 3 seconds  A data node is regarded as “DEAD” if it doesn’t send a heartbeat in 10 minutes.  Name Node will replicate blocks to other DN when one data node is dead.
  • 15. Productivity Optimization Lab Shao-Yen Hung Architecture 15 (Data Store) (Data Processing) Name Node Secondary Name Node Job Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Masters Slaves
  • 16. Productivity Optimization Lab Shao-Yen Hung 16 MapReduce Algorithm Name Node Job Tracker Client (e.g.)How many times does “POLab” occur in File? Blk1: DN1, DN7, DN8 Blk2: DN2, DN5, DN6 Blk3: DN4, DN12, DN13 (1) (2) Task Tracker DN2 Task Tracker DN1 Task Tracker DN3 Task Tracker DN4 Blk 1 Blk 2 Blk 3 (3) Map POLab = 3 POLab = 0 POLab = 11 (4) Reduce POLab = 14  A divide and conquer algorithm
  • 17. Productivity Optimization Lab Shao-Yen Hung 17 Hadoop Ecosystem http://www.inside.com.tw/2015/03/12/big-data-4-hadoop
  • 18. Productivity Optimization Lab Shao-Yen Hung 18 Reference(學習地圖) • 認識大數據的黃色小象幫手 –– Hadoop • HDFS Explained as Comics • Understanding Hadoop Clusters and the Network • How to run Hadoop on Linux? (Practice)*