図でわかるHDFS Erasure Coding

Kai Sasaki
Kai SasakiSoftware engineer - Treasure Data
図でわかる

HDFS Erasure Coding
Kai Sasaki
Treasure Data Inc.
Who am I
佐々木 海(Kai Sasaki)
Software Engineer at Treasure Data Inc.

http://www.treasuredata.com
Hadoop, Spark, DL4J
Agenda
• Erasure Coding
• Under the Namespace
• Writing Side
• Reading Side
Erasure Coding
Replication
Block
HDFS
Replication
Block
Block
Block
Block
HDFS
Replication
Block
Block
Block
Block
HDFS
Replication
Block
Block
Block
Block
HDFS
Capacity
Overhead 

x3
Replication
Block
Block
Block
Block
HDFS
Redundancy
2
Capacity
Overhead 

x3
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block
RS-6-3
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block6 out of 9
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block6 out of 9
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block6 out of 9
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block6 out of 9
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
BlockRedundancy
3
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block
Capacity
Overhead 

x1.5
Redundancy
3
Erasure Coding
Block
Block
HDFS
Block
Block
Block
Block
Block
Block
Block
Block
BlockGroup
Under the Namespace
INode and BlockInfo
BlockInfo
INode
INode and BlockInfo
BlockInfo
INode
BlockInfo BlockInfo…
INode and BlockInfo
BlockInfo
INode
BlockInfo BlockInfo…
…
Block
Block
Block
INode and BlockInfo
BlockInfo
INode
BlockInfo BlockInfo…
…
Block
Block
Block
BlockGroup
BlockInfo
…
Block
Block
Block
BlockGroup
long BlockId
0 64
BlockInfo
…
Block
Block
Block
BlockGroup
long BlockId
0 64
BlockInfo
…
Block
Block
Block
BlockGroup
index GroupId
4bit 60bit
long BlockId
0 64
BlockInfo
…
Block
Block
Block
BlockGroup
index GroupId
4bit 60bit
index 0
index 2
index 1
Saving memory
Writing Side
Data
0
Data64KB
0
Data
BlockGroup
64KB
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data Block
Data
BlockGroup
0
Data Block Parity Block
Data
BlockGroup
0
Data Block Parity Block
Stripe
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
Data
BlockGroup
0
0
Data
BlockGroup
0
0
Data
BlockGroup
0
0
Saving diskspace usage
Reading Side
BlockGroup
0
0
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
BlockGroup
0
0
200kb 500kb
Saving reading time
まとめ
• Namespace -> Saving memory

BlockInfoStriped, BlockIdManager
• Writing Side -> Saving diskspace usage

INodeFile
• Reading Side -> Saving reading time

DFSStripedInputStream
ありがとうございました
1 of 61

More Related Content

Viewers also liked(20)

Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
Red_Hat_Storage7.3K views
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe2.4K views
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit6K views
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
Yifeng Jiang2.3K views
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit3.3K views
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit761 views
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks7.2K views
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit2.4K views
WebSocketのキホンWebSocketのキホン
WebSocketのキホン
You_Kinjoh24.1K views
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit2.1K views

Similar to 図でわかるHDFS Erasure Coding(11)

More from Kai Sasaki(20)

Recently uploaded(20)

PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdf
stuartmcphersonflipm314 views
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIA
Federico Karagulian5 views
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela166 views
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 views
ColonyOSColonyOS
ColonyOS
JohanKristiansson69 views
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra10 views
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann102 views
Microsoft Fabric.pptxMicrosoft Fabric.pptx
Microsoft Fabric.pptx
Shruti Chaurasia19 views
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
thomasjvarghese4918 views
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar14 views

図でわかるHDFS Erasure Coding