SlideShare a Scribd company logo
Submit Search
Upload
図でわかるHDFS Erasure Coding
Report
Kai Sasaki
Software engineer - Treasure Data
Follow
•
11 likes
•
4,771 views
1
of
61
図でわかるHDFS Erasure Coding
•
11 likes
•
4,771 views
Download Now
Download to read offline
Report
Data & Analytics
Illustration of HDFS Erasure Coding
Read more
Kai Sasaki
Software engineer - Treasure Data
Follow
Recommended
Spark meetup london share and analyse genomic data at scale with spark, adam...
Andy Petrella
2.1K views
•
62 slides
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
5.4K views
•
42 slides
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
3K views
•
33 slides
Retrieving big data for the non developer
Gustaf Cavanaugh
351 views
•
27 slides
Fast Variant Calling with ADAM and avocado
fnothaft
1.8K views
•
23 slides
Hadoop training by keylabs
Siva Sankar
1K views
•
42 slides
More Related Content
Viewers also liked
Erasure codes and storage tiers on gluster
Red_Hat_Storage
7.3K views
•
26 slides
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Zhe Zhang
437 views
•
28 slides
What's new in hadoop 3.0
Heiko Loewe
2.4K views
•
65 slides
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
3.3K views
•
27 slides
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
6K views
•
66 slides
Erasure Coding Costs and Benefits
John Cook
737 views
•
122 slides
Viewers also liked
(20)
Erasure codes and storage tiers on gluster
Red_Hat_Storage
•
7.3K views
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Zhe Zhang
•
437 views
What's new in hadoop 3.0
Heiko Loewe
•
2.4K views
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
•
3.3K views
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
•
6K views
Erasure Coding Costs and Benefits
John Cook
•
737 views
Native erasure coding support inside hdfs presentation
lin bao
•
889 views
Data Science Crash Course Hadoop Summit SJ
Daniel Madrigal
•
606 views
HDFS Deep Dive
Yifeng Jiang
•
2.3K views
Timeline Service v.2 (Hadoop Summit 2016)
Sangjin Lee
•
1.5K views
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
•
3.3K views
トランザクションの並行処理制御
Takashi Hoshino
•
6.1K views
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
•
761 views
Hadoop crashcourse v3
Hortonworks
•
7.2K views
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
•
29.5K views
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
•
960 views
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
•
2.4K views
WebSocketのキホン
You_Kinjoh
•
24.1K views
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
•
2.1K views
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
•
11K views
Similar to 図でわかるHDFS Erasure Coding
Design for a Distributed Name Node
Aaron Cordova
4.6K views
•
36 slides
Etu L2 Training - Hadoop 企業應用實作
James Chen
2.5K views
•
149 slides
Hadoop 1.x vs 2
Rommel Garcia
20K views
•
23 slides
HDFS introduction
injae yeo
760 views
•
23 slides
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
3.3K views
•
31 slides
Spark cassandra integration, theory and practice
Duyhai Doan
1.6K views
•
90 slides
Similar to 図でわかるHDFS Erasure Coding
(11)
Design for a Distributed Name Node
Aaron Cordova
•
4.6K views
Etu L2 Training - Hadoop 企業應用實作
James Chen
•
2.5K views
Hadoop 1.x vs 2
Rommel Garcia
•
20K views
HDFS introduction
injae yeo
•
760 views
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
•
3.3K views
Spark cassandra integration, theory and practice
Duyhai Doan
•
1.6K views
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
•
963 views
Tendências e Evoluções em Armazemamento de Dados
Jefferson Alcantara
•
278 views
Ceph Internals
Victor Santos
•
268 views
Hadoop Interview Questions and Answers
MindsMapped Consulting
•
271 views
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Edureka!
•
5.4K views
More from Kai Sasaki
Graviton 2で実現する コスト効率のよいCDP基盤
Kai Sasaki
2.1K views
•
27 slides
Infrastructure for auto scaling distributed system
Kai Sasaki
1.5K views
•
33 slides
Continuous Optimization for Distributed BigData Analysis
Kai Sasaki
1.2K views
•
38 slides
Recent Changes and Challenges for Future Presto
Kai Sasaki
1.3K views
•
32 slides
Real World Storage in Treasure Data
Kai Sasaki
542 views
•
67 slides
20180522 infra autoscaling_system
Kai Sasaki
1.2K views
•
33 slides
More from Kai Sasaki
(20)
Graviton 2で実現する コスト効率のよいCDP基盤
Kai Sasaki
•
2.1K views
Infrastructure for auto scaling distributed system
Kai Sasaki
•
1.5K views
Continuous Optimization for Distributed BigData Analysis
Kai Sasaki
•
1.2K views
Recent Changes and Challenges for Future Presto
Kai Sasaki
•
1.3K views
Real World Storage in Treasure Data
Kai Sasaki
•
542 views
20180522 infra autoscaling_system
Kai Sasaki
•
1.2K views
User Defined Partitioning on PlazmaDB
Kai Sasaki
•
1.4K views
Deep dive into deeplearn.js
Kai Sasaki
•
2.9K views
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
•
2.4K views
Presto updates to 0.178
Kai Sasaki
•
1.3K views
How to ensure Presto scalability in multi use case
Kai Sasaki
•
4.2K views
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
•
2.2K views
Embulk makes Japan visible
Kai Sasaki
•
4.3K views
Maintainable cloud architecture_of_hadoop
Kai Sasaki
•
4.3K views
Spark MLlib code reading ~optimization~
Kai Sasaki
•
835 views
How I tried MADE
Kai Sasaki
•
1.2K views
Reading kernel org
Kai Sasaki
•
817 views
Reading drill
Kai Sasaki
•
1.1K views
Kernel ext4
Kai Sasaki
•
1.6K views
Kernel bootstrap
Kai Sasaki
•
1.3K views
Recently uploaded
PTicketInput.pdf
stuartmcphersonflipm
314 views
•
1 slide
Cross-network in Google Analytics 4.pdf
GA4 Tutorials
6 views
•
7 slides
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
StatsCommunications
55 views
•
5 slides
MOSORE_BRESCIA
Federico Karagulian
5 views
•
8 slides
RuleBookForTheFairDataEconomy.pptx
noraelstela1
66 views
•
16 slides
GA4 - Google Analytics 4 - Session Metrics.pdf
GA4 Tutorials
20 views
•
14 slides
Recently uploaded
(20)
PTicketInput.pdf
stuartmcphersonflipm
•
314 views
Cross-network in Google Analytics 4.pdf
GA4 Tutorials
•
6 views
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
StatsCommunications
•
55 views
MOSORE_BRESCIA
Federico Karagulian
•
5 views
RuleBookForTheFairDataEconomy.pptx
noraelstela1
•
66 views
GA4 - Google Analytics 4 - Session Metrics.pdf
GA4 Tutorials
•
20 views
Survey on Factuality in LLM's.pptx
NeethaSherra1
•
5 views
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
•
91 views
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials
•
8 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej
•
6 views
ColonyOS
JohanKristiansson6
•
9 views
How Leaders See Data? (Level 1)
Narendra Narendra
•
10 views
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski
•
10 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas12611618
•
8 views
Building Real-Time Travel Alerts
Timothy Spann
•
102 views
The Business Tycoons (Jan-2023) - The Unparalleled Digital Leaders
Global India Business Forum
•
14 views
Microsoft Fabric.pptx
Shruti Chaurasia
•
19 views
Journey of Generative AI
thomasjvarghese49
•
18 views
PROGRAMME.pdf
HiNedHaJar
•
14 views
Short Story Assignment by Kelly Nguyen
kellynguyen01
•
14 views
図でわかるHDFS Erasure Coding
1.
図でわかる HDFS Erasure Coding Kai
Sasaki Treasure Data Inc.
2.
Who am I 佐々木
海(Kai Sasaki) Software Engineer at Treasure Data Inc. http://www.treasuredata.com Hadoop, Spark, DL4J
3.
Agenda • Erasure Coding •
Under the Namespace • Writing Side • Reading Side
4.
Erasure Coding
5.
Replication Block HDFS
6.
Replication Block Block Block Block HDFS
7.
Replication Block Block Block Block HDFS
8.
Replication Block Block Block Block HDFS Capacity Overhead x3
9.
Replication Block Block Block Block HDFS Redundancy 2 Capacity Overhead x3
10.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block RS-6-3
11.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block6 out
of 9
12.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block6 out
of 9
13.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block6 out
of 9
14.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block6 out
of 9
15.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block BlockRedundancy 3
16.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block Capacity Overhead x1.5 Redundancy 3
17.
Erasure Coding Block Block HDFS Block Block Block Block Block Block Block Block BlockGroup
18.
Under the Namespace
19.
INode and BlockInfo BlockInfo INode
20.
INode and BlockInfo BlockInfo INode BlockInfo
BlockInfo…
21.
INode and BlockInfo BlockInfo INode BlockInfo
BlockInfo… … Block Block Block
22.
INode and BlockInfo BlockInfo INode BlockInfo
BlockInfo… … Block Block Block BlockGroup
23.
BlockInfo … Block Block Block BlockGroup
24.
long BlockId 0 64 BlockInfo … Block Block Block BlockGroup
25.
long BlockId 0 64 BlockInfo … Block Block Block BlockGroup index
GroupId 4bit 60bit
26.
long BlockId 0 64 BlockInfo … Block Block Block BlockGroup index
GroupId 4bit 60bit index 0 index 2 index 1 Saving memory
27.
Writing Side
28.
Data 0
29.
Data64KB 0
30.
Data BlockGroup 64KB 0
31.
Data BlockGroup 0
32.
Data BlockGroup 0
33.
Data BlockGroup 0
34.
Data BlockGroup 0
35.
Data BlockGroup 0
36.
Data BlockGroup 0
37.
Data BlockGroup 0
38.
Data BlockGroup 0 Data Block
39.
Data BlockGroup 0 Data Block Parity
Block
40.
Data BlockGroup 0 Data Block Parity
Block Stripe
41.
Data BlockGroup 0
42.
Data BlockGroup 0
43.
Data BlockGroup 0
44.
Data BlockGroup 0
45.
Data BlockGroup 0
46.
Data BlockGroup 0
47.
Data BlockGroup 0 0
48.
Data BlockGroup 0 0
49.
Data BlockGroup 0 0 Saving diskspace usage
50.
Reading Side
51.
BlockGroup 0 0
52.
BlockGroup 0 0 200kb 500kb
53.
BlockGroup 0 0 200kb 500kb
54.
BlockGroup 0 0 200kb 500kb
55.
BlockGroup 0 0 200kb 500kb
56.
BlockGroup 0 0 200kb 500kb
57.
BlockGroup 0 0 200kb 500kb
58.
BlockGroup 0 0 200kb 500kb
59.
BlockGroup 0 0 200kb 500kb Saving reading
time
60.
まとめ • Namespace ->
Saving memory BlockInfoStriped, BlockIdManager • Writing Side -> Saving diskspace usage INodeFile • Reading Side -> Saving reading time DFSStripedInputStream
61.
ありがとうございました