SlideShare a Scribd company logo
HDFS Erasure Coding
Zhe Zhang
zhezhang@cloudera.com
Replication is Expensive
§ HDFS inherits 3-way replication from Google File System
- Simple, scalable and robust
Replication is Expensive
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
§ HDFS inherits 3-way replication from Google File System
- Simple, scalable and robust
§ 200% storage overhead
Replication is Expensive
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
§ HDFS inherits 3-way replication from Google File System
- Simple, scalable and robust
§ 200% storage overhead
§ Secondary replicas rarely accessed
Replication is Expensive
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
Erasure Coding Saves Storage
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
1 0Replication:
XOR Coding: 1 0
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
1 01 0Replication:
XOR Coding: 1 0
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
1 01 0Replication:
XOR Coding: 1 0
2 extra bits
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
1 01 0Replication:
XOR Coding: 1 0⊕ 1=
2 extra bits
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
1 01 0Replication:
XOR Coding: 1 0⊕ 1=
2 extra bits
1 extra bit
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
§ Same data durability
- can lose any 1 bit
1 01 0Replication:
XOR Coding: 1 0⊕ 1=
2 extra bits
1 extra bit
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
§ Same data durability
- can lose any 1 bit
§ Half the storage overhead
1 01 0Replication:
XOR Coding: 1 0⊕ 1=
2 extra bits
1 extra bit
Erasure Coding Saves Storage
§ Simplified Example: storing 2 bits
§ Same data durability
- can lose any 1 bit
§ Half the storage overhead
§ Slower recovery
1 01 0Replication:
XOR Coding: 1 0⊕ 1=
2 extra bits
1 extra bit
Erasure Coding Saves Storage
Erasure Coding Saves Storage
§ Facebook
- f4 stores 65PB of BLOBs in EC
Erasure Coding Saves Storage
§ Facebook
- f4 stores 65PB of BLOBs in EC
§ Windows Azure Storage (WAS)
- A PB of new data every 1~2 days
- All “sealed” data stored in EC
Erasure Coding Saves Storage
§ Facebook
- f4 stores 65PB of BLOBs in EC
§ Windows Azure Storage (WAS)
- A PB of new data every 1~2 days
- All “sealed” data stored in EC
§ Google File System
- Large portion of data stored in EC
Roadmap
Roadmap
§ Background of EC
- Redundancy Theory
- EC in Distributed Storage Systems
Roadmap
§ Background of EC
- Redundancy Theory
- EC in Distributed Storage Systems
§ HDFS-EC architecture
- Choosing Block Layout
- NameNode — Generalizing the Block Concept
- Client — Parallel I/O
- DataNode — Background Reconstruction
Roadmap
§ Background of EC
- Redundancy Theory
- EC in Distributed Storage Systems
§ HDFS-EC architecture
- Choosing Block Layout
- NameNode — Generalizing the Block Concept
- Client — Parallel I/O
- DataNode — Background Reconstruction
§ Hardware-accelerated Codec Framework
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
3-way Replication:
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
3-way Replication:
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
3-way Replication: Data Durability = 2
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
3-way Replication: Data Durability = 2
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
useful data
3-way Replication: Data Durability = 2
redundant data
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Replica
DataNode0 DataNode1 DataNode2
Block
NameNode
Replica Replica
useful data
3-way Replication: Data Durability = 2
Storage Efficiency = 1/3 (33%)
redundant data
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
XOR:
X Y X ⊕ Y
0 0 0
0 1 1
1 0 1
1 1 0
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
XOR:
X Y X ⊕ Y
0 0 0
0 1 1
1 0 1
1 1 0
Y = 0 ⊕ 1 = 1
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
XOR:
Data Durability = 1
X Y X ⊕ Y
0 0 0
0 1 1
1 0 1
1 1 0
Y = 0 ⊕ 1 = 1
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
XOR:
Data Durability = 1
useful data redundant data
X Y X ⊕ Y
0 0 0
0 1 1
1 0 1
1 1 0
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
XOR:
Data Durability = 1
Storage Efficiency = 2/3 (67%)
useful data redundant data
X Y X ⊕ Y
0 0 0
0 1 1
1 0 1
1 1 0
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Reed-Solomon (RS):
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Reed-Solomon (RS):
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Reed-Solomon (RS):
Data Durability = 2
Storage Efficiency = 4/6 (67%)
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Reed-Solomon (RS):
Data Durability = 2
Storage Efficiency = 4/6 (67%)
Very flexible!
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
RS (6,3)
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
RS (6,3) 3
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
RS (6,3) 3 67%
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
RS (6,3) 3 67%
RS (10,4)
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
RS (6,3) 3 67%
RS (10,4) 4
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How much portion of storage is for useful data?
Data Durability Storage Efficiency
Single Replica 0 100%
3-way Replication 2 33%
XOR with 6 data cells 1 86%
RS (6,3) 3 67%
RS (10,4) 4 71%
EC in Distributed Storage
Block Layout:
128~256MFile 0~128M … 640~768M0~128M 128~256M
EC in Distributed Storage
Block Layout:
128~256MFile … 640~768M
0~128
M
block0
DataNode 0
0~128M 128~256M
EC in Distributed Storage
Block Layout:
File … 640~768M
0~128
M
block0
DataNode 0
128~
256M
block1
DataNode 1
0~128M 128~256M
EC in Distributed Storage
Block Layout:
File … 640~768M
0~128
M
block0
DataNode 0
128~
256M
block1
DataNode 1
0~128M 128~256M
… 640~
768M
block5
DataNode 5
EC in Distributed Storage
Block Layout:
File … 640~768M
0~128
M
block0
DataNode 0
128~
256M
block1
DataNode 1
0~128M 128~256M
… 640~
768M
block5
DataNode 5 DataNode 6
…
parity
EC in Distributed Storage
Block Layout:
File … 640~768M
0~128
M
block0
DataNode 0
128~
256M
block1
DataNode 1
0~128M 128~256M
… 640~
768M
block5
DataNode 5 DataNode 6
…
parity
Contiguous Layout:
EC in Distributed Storage
Block Layout:
Data Locality !
File … 640~768M
0~128
M
block0
DataNode 0
128~
256M
block1
DataNode 1
0~128M 128~256M
… 640~
768M
block5
DataNode 5 DataNode 6
…
parity
Contiguous Layout:
EC in Distributed Storage
Block Layout:
Data Locality !
Small Files "
File … 640~768M
0~128
M
block0
DataNode 0
128~
256M
block1
DataNode 1
0~128M 128~256M
… 640~
768M
block5
DataNode 5 DataNode 6
…
parity
Contiguous Layout:
EC in Distributed Storage
Block Layout:
File
block0
DataNode 0
block1
DataNode 1
…
block5
DataNode 5 DataNode 6
…
parity
0~128M 128~256M
EC in Distributed Storage
Block Layout:
File
block0
DataNode 0
block1
DataNode 1
…
block5
DataNode 5 DataNode 6
…
parity
0~1M 1~2M 5~6M
0~128M 128~256M
EC in Distributed Storage
Block Layout:
File
block0
DataNode 0
block1
DataNode 1
…
block5
DataNode 5 DataNode 6
…
parity
0~1M 1~2M 5~6M
6~7M
0~128M 128~256M
EC in Distributed Storage
Block Layout:
File
block0
DataNode 0
block1
DataNode 1
…
block5
DataNode 5 DataNode 6
…
parity
Striped Layout:
0~1M 1~2M 5~6M
6~7M
Data Locality "
Small Files !
Parallel I/O !
0~128M 128~256M
EC in Distributed Storage
Spectrum:
Replication
Erasure
Coding
Striping
Contiguous
Ceph
Ceph
Quancast File System
Quancast File System
HDFS Facebook f4
Windows Azure
Roadmap
§ Background of EC
- Redundancy Theory
- EC in Distributed Storage Systems
§ HDFS-EC architecture
- Choosing Block Layout
- NameNode — Generalizing the Block Concept
- Client — Parallel I/O
- DataNode — Background Reconstruction
§ Hardware-accelerated Codec Framework
Choosing Block Layout
• Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group)
Choosing Block Layout
• Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group)
64.61%
9.33%
26.06%
1.85%1.86%
96.29%
small medium large
file count
space usage
Top 2% files occupy ~65% space
Cluster A Profile
Choosing Block Layout
• Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group)
64.61%
9.33%
26.06%
1.85%1.86%
96.29%
small medium large
file count
space usage
Top 2% files occupy ~65% space
Cluster A Profile
40.08%
36.03%
23.89%
2.03%
11.38%
86.59% file count
space
usage
Top 2% files occupy ~40% space
small medium large
Cluster B Profile
Choosing Block Layout
• Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group)
64.61%
9.33%
26.06%
1.85%1.86%
96.29%
small medium large
file count
space usage
Top 2% files occupy ~65% space
Cluster A Profile
40.08%
36.03%
23.89%
2.03%
11.38%
86.59% file count
space
usage
Top 2% files occupy ~40% space
small medium large
Cluster B Profile
3.20%
20.75%
76.05%
0.00%0.36%
99.64%
file count
space usage
Dominated by small files
small medium large
Cluster C Profile
Choosing Block Layout
Striping
Contiguous
Replication
Erasure
Coding
Phase
1.1
Phase
1.2
Phase 2
(Future work)
Phase 3
(Future work)
Current
HDFS
Generalizing Block NameNode
Generalizing Block NameNode
Mapping Logical and Storage Blocks
Generalizing Block NameNode
Mapping Logical and Storage Blocks Too Many Storage Blocks?
Generalizing Block NameNode
Mapping Logical and Storage Blocks Too Many Storage Blocks?
Hierarchical Naming Protocol:
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Coordinator
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Coordinator
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Coordinator
Client Parallel Writing
streamer
queue
streamer … streamer
DataNode DataNode DataNode
Coordinator
Client Parallel Reading
… DataNodeDataNode DataNode DataNode DataNode
Client Parallel Reading
… DataNodeDataNode DataNode DataNode DataNode
Client Parallel Reading
… DataNodeDataNode DataNode DataNode DataNode
Client Parallel Reading
… DataNodeDataNode DataNode DataNode DataNode
parity
Reconstruction on DataNode
§ Important to avoid delay on the critical path
- Especially if original data is lost
§ Integrated with Replication Monitor
- Under-protected EC blocks scheduled together with under-replicated blocks
- New priority algorithms
§ New ErasureCodingWorker component on DataNode
Roadmap
§ Background of EC
- Redundancy Theory
- EC in Distributed Storage Systems
§ HDFS-EC architecture
- Choosing Block Layout
- NameNode — Generalizing the Block Concept
- Client — Parallel I/O
- DataNode — Background Reconstruction
§ Hardware-accelerated Codec Framework
Acceleration with Intel ISA-L
§ 1 legacy coder
- From Facebook’s HDFS-RAID project
§ 2 new coders
- Pure Java — code improvement over HDFS-RAID
- Native coder with Intel’s Intelligent Storage Acceleration Library (ISA-L)
Microbenchmark: Codec Calculation
Microbenchmark: HDFS I/O
Conclusion
Conclusion
§ Erasure coding expands effective storage space by ~50%!
Conclusion
§ Erasure coding expands effective storage space by ~50%!
§ HDFS-EC phase I implements erasure coding in striped block layout
Conclusion
§ Erasure coding expands effective storage space by ~50%!
§ HDFS-EC phase I implements erasure coding in striped block layout
§ Upstream effort (HDFS-7285):
- Design finalized Nov. 2014
- Development started Jan. 2015
- 218 commits, ~25k LoC change
- Broad collaboration: Cloudera, Intel, Hortonworks, Huawei, Yahoo (Japan)
Conclusion
§ Erasure coding expands effective storage space by ~50%!
§ HDFS-EC phase I implements erasure coding in striped block layout
§ Upstream effort (HDFS-7285):
- Design finalized Nov. 2014
- Development started Jan. 2015
- 218 commits, ~25k LoC change
- Broad collaboration: Cloudera, Intel, Hortonworks, Huawei, Yahoo (Japan)
§ Phase II will support contiguous block layout for better locality
Acknowledgements
§ Cloudera
- Andrew Wang, Aaron T. Myers, Colin McCabe, Todd Lipcon, Silvius Rus
§ Intel
- Kai Zheng, Uma Maheswara Rao G, Vinayakumar B, Yi Liu, Weihua Jiang
§ Hortonworks
- Jing Zhao, Tsz Wo Nicholas Sze
§ Huawei
- Walter Su, Rakesh R, Xinwei Qin
§ Yahoo (Japan)
- Gao Rui, Kai Sasaki, Takuya Fukudome, Hui Zheng
Just merged to trunk!
Questions?
Just merged to trunk!
Questions?
Just merged to trunk!
Erasure Coding:A type of Error Correction Coding
EC in Distributed Storage
Spectrum:
EC in Distributed Storage
0~128
M
128~256
M
DataNode0
block0
block1
…
DataNode1
640~768
M
DataNode5
block5
Contiguous
DataNode6 DataNode8
data parity
…
Block Layout:
128~256MFile 0~128M … 640~768M
EC in Distributed Storage
0~128
M
128~256
M
DataNode0
block0
block1
…
DataNode1
640~768
M
DataNode5
block5
Contiguous
DataNode6 DataNode8
data parity
…
Block Layout:
Data Locality !
128~256MFile 0~128M … 640~768M
EC in Distributed Storage
0~128
M
128~256
M
DataNode0
block0
block1
…
DataNode1
640~768
M
DataNode5
block5
Contiguous
DataNode6 DataNode8
data parity
…
Block Layout:
Data Locality !
Small Files "
128~256MFile 0~128M … 640~768M
EC in Distributed Storage
0~128
M
128~256
M
DataNode0
block0
block1
…
DataNode1
640~768
M
DataNode5
block5
Contiguous
DataNode6 DataNode8
data parity
…
Block Layout:
Data Locality !
Small Files "
128~256MFile … 640~768M
EC in Distributed Storage
0~1M
…
…
1~2M
…
…
DataNode0
block0
DataNode1
5~6M
…
127~128M
DataNode5
Striping
DataNode6 DataNode8
data parity
……
Block Layout:
EC in Distributed Storage
0~1M
…
…
1~2M
…
…
DataNode0
block0
DataNode1
5~6M
…
127~128M
DataNode5
Striping
DataNode6 DataNode8
data parity
……
Block Layout:
Data Locality "
EC in Distributed Storage
0~1M
…
…
1~2M
…
…
DataNode0
block0
DataNode1
5~6M
…
127~128M
DataNode5
Striping
DataNode6 DataNode8
data parity
……
Block Layout:
Data Locality "
Small Files !
EC in Distributed Storage
0~1M
…
…
1~2M
…
…
DataNode0
block0
DataNode1
5~6M
…
127~128M
DataNode5
Striping
DataNode6 DataNode8
data parity
……
Block Layout:
Data Locality "
Small Files !
Parallel I/O !
Client Parallel Writing
blockGroup
DataStreamer 0 DataStreamer 1 DataStreamer 2 DataStreamer 3 DataStreamer 4
DFSStripedOutputStream
dataQueue 0 dataQueue 1 dataQueue 2 dataQueue 3 dataQueue 4
blk_1009 blk_1010 blk_1011 blk_1012 blk_1013
Coordinator
allocate new blockGroup
Client Parallel Reading
Stripe 0
Stripe 1
Stripe 2
DataNode 0 DataNode 1 DataNode 2 DataNode 2 DataNode 3
(parity blocks)(data blocks)
all zero all zero
requested
requested requested requested
requested
recovery
read
recovery
read
recovery
read
recovery
read
recovery
read
recovery
read
recovery
read
recovery
read

More Related Content

What's hot

Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
The HDF-EOS Tools and Information Center
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
kapa rohit
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
elliando dias
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
DataWorks Summit/Hadoop Summit
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
Biju Nair
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
Rajesh Ananda Kumar
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
Steve Loughran
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
Konstantin V. Shvachko
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
Brett Keim
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Mahendran Ponnusamy
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Uday Vakalapudi
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage Challenge
DataWorks Summit
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
DataWorks Summit/Hadoop Summit
 
CloverETL + Hadoop
CloverETL + HadoopCloverETL + Hadoop
CloverETL + Hadoop
David Pavlis
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
 

What's hot (20)

Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage Challenge
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
CloverETL + Hadoop
CloverETL + HadoopCloverETL + Hadoop
CloverETL + Hadoop
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 

Viewers also liked

図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
Kai Sasaki
 
Data Science Crash Course Hadoop Summit SJ
Data Science Crash Course Hadoop Summit SJData Science Crash Course Hadoop Summit SJ
Data Science Crash Course Hadoop Summit SJ
Daniel Madrigal
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
Yifeng Jiang
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
Sangjin Lee
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 

Viewers also liked (15)

図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
 
Data Science Crash Course Hadoop Summit SJ
Data Science Crash Course Hadoop Summit SJData Science Crash Course Hadoop Summit SJ
Data Science Crash Course Hadoop Summit SJ
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 

Similar to Native erasure coding support inside hdfs presentation

Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
 
15 B-Trees
15 B-Trees15 B-Trees
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Community
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
In-Memory Computing Summit
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
Microsoft TechNet - Belgium and Luxembourg
 
Storage talk
Storage talkStorage talk
Storage talk
christkv
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
Aaron Joue
 
hierarchical memory technology.pptx
hierarchical memory technology.pptxhierarchical memory technology.pptx
hierarchical memory technology.pptx
2105986
 
computer-memory
computer-memorycomputer-memory
computer-memory
Bablu Shofi
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
Ashish Thapliyal
 
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Raid Data Recovery
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
Speedment, Inc.
 
disk structure and multiple RAID levels .ppt
disk structure and multiple  RAID levels .pptdisk structure and multiple  RAID levels .ppt
disk structure and multiple RAID levels .ppt
RAJASEKHARV10
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
inwin stack
 
Chapter 3
Chapter 3Chapter 3
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
Aerospike, Inc.
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
Shy Engelberg
 
Caching Strategies for an Erlang Based Web Stack
Caching Strategies for an Erlang Based Web StackCaching Strategies for an Erlang Based Web Stack
Caching Strategies for an Erlang Based Web Stack
enriquepazperez
 
Art of the Possible_Tim Faulkes.pdf
Art of the Possible_Tim Faulkes.pdfArt of the Possible_Tim Faulkes.pdf
Art of the Possible_Tim Faulkes.pdf
Aerospike, Inc.
 

Similar to Native erasure coding support inside hdfs presentation (20)

Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
15 B-Trees
15 B-Trees15 B-Trees
15 B-Trees
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
 
Storage talk
Storage talkStorage talk
Storage talk
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
hierarchical memory technology.pptx
hierarchical memory technology.pptxhierarchical memory technology.pptx
hierarchical memory technology.pptx
 
computer-memory
computer-memorycomputer-memory
computer-memory
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
Understanding RAID Levels (RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5)
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
disk structure and multiple RAID levels .ppt
disk structure and multiple  RAID levels .pptdisk structure and multiple  RAID levels .ppt
disk structure and multiple RAID levels .ppt
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Caching Strategies for an Erlang Based Web Stack
Caching Strategies for an Erlang Based Web StackCaching Strategies for an Erlang Based Web Stack
Caching Strategies for an Erlang Based Web Stack
 
Art of the Possible_Tim Faulkes.pdf
Art of the Possible_Tim Faulkes.pdfArt of the Possible_Tim Faulkes.pdf
Art of the Possible_Tim Faulkes.pdf
 

Recently uploaded

ASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdfASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdf
ToshihiroIto4
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Gregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics PresentationGregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics Presentation
gharris9
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
gpww3sf4
 
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
SkillCertProExams
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
artemacademy2
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
OECD Directorate for Financial and Enterprise Affairs
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Rosie Wells
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij
 
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPointMẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
1990 Media
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
OECD Directorate for Financial and Enterprise Affairs
 
Updated diagnosis. Cause and treatment of hypothyroidism
Updated diagnosis. Cause and treatment of hypothyroidismUpdated diagnosis. Cause and treatment of hypothyroidism
Updated diagnosis. Cause and treatment of hypothyroidism
Faculty of Medicine And Health Sciences
 

Recently uploaded (20)

ASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdfASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdf
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
 
Gregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics PresentationGregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics Presentation
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
 
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
 
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
 
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPointMẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
Mẫu PPT kế hoạch làm việc sáng tạo cho nửa cuối năm PowerPoint
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
 
Updated diagnosis. Cause and treatment of hypothyroidism
Updated diagnosis. Cause and treatment of hypothyroidismUpdated diagnosis. Cause and treatment of hypothyroidism
Updated diagnosis. Cause and treatment of hypothyroidism
 

Native erasure coding support inside hdfs presentation

  • 1. HDFS Erasure Coding Zhe Zhang zhezhang@cloudera.com
  • 3. § HDFS inherits 3-way replication from Google File System - Simple, scalable and robust Replication is Expensive Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica
  • 4. § HDFS inherits 3-way replication from Google File System - Simple, scalable and robust § 200% storage overhead Replication is Expensive Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica
  • 5. § HDFS inherits 3-way replication from Google File System - Simple, scalable and robust § 200% storage overhead § Secondary replicas rarely accessed Replication is Expensive Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica
  • 7. Erasure Coding Saves Storage § Simplified Example: storing 2 bits 1 0Replication: XOR Coding: 1 0
  • 8. Erasure Coding Saves Storage § Simplified Example: storing 2 bits 1 01 0Replication: XOR Coding: 1 0
  • 9. Erasure Coding Saves Storage § Simplified Example: storing 2 bits 1 01 0Replication: XOR Coding: 1 0 2 extra bits
  • 10. Erasure Coding Saves Storage § Simplified Example: storing 2 bits 1 01 0Replication: XOR Coding: 1 0⊕ 1= 2 extra bits
  • 11. Erasure Coding Saves Storage § Simplified Example: storing 2 bits 1 01 0Replication: XOR Coding: 1 0⊕ 1= 2 extra bits 1 extra bit
  • 12. Erasure Coding Saves Storage § Simplified Example: storing 2 bits § Same data durability - can lose any 1 bit 1 01 0Replication: XOR Coding: 1 0⊕ 1= 2 extra bits 1 extra bit
  • 13. Erasure Coding Saves Storage § Simplified Example: storing 2 bits § Same data durability - can lose any 1 bit § Half the storage overhead 1 01 0Replication: XOR Coding: 1 0⊕ 1= 2 extra bits 1 extra bit
  • 14. Erasure Coding Saves Storage § Simplified Example: storing 2 bits § Same data durability - can lose any 1 bit § Half the storage overhead § Slower recovery 1 01 0Replication: XOR Coding: 1 0⊕ 1= 2 extra bits 1 extra bit
  • 16. Erasure Coding Saves Storage § Facebook - f4 stores 65PB of BLOBs in EC
  • 17. Erasure Coding Saves Storage § Facebook - f4 stores 65PB of BLOBs in EC § Windows Azure Storage (WAS) - A PB of new data every 1~2 days - All “sealed” data stored in EC
  • 18. Erasure Coding Saves Storage § Facebook - f4 stores 65PB of BLOBs in EC § Windows Azure Storage (WAS) - A PB of new data every 1~2 days - All “sealed” data stored in EC § Google File System - Large portion of data stored in EC
  • 20. Roadmap § Background of EC - Redundancy Theory - EC in Distributed Storage Systems
  • 21. Roadmap § Background of EC - Redundancy Theory - EC in Distributed Storage Systems § HDFS-EC architecture - Choosing Block Layout - NameNode — Generalizing the Block Concept - Client — Parallel I/O - DataNode — Background Reconstruction
  • 22. Roadmap § Background of EC - Redundancy Theory - EC in Distributed Storage Systems § HDFS-EC architecture - Choosing Block Layout - NameNode — Generalizing the Block Concept - Client — Parallel I/O - DataNode — Background Reconstruction § Hardware-accelerated Codec Framework
  • 23. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data?
  • 24. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica 3-way Replication:
  • 25. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica 3-way Replication:
  • 26. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica 3-way Replication: Data Durability = 2
  • 27. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica 3-way Replication: Data Durability = 2
  • 28. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica useful data 3-way Replication: Data Durability = 2 redundant data
  • 29. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Replica DataNode0 DataNode1 DataNode2 Block NameNode Replica Replica useful data 3-way Replication: Data Durability = 2 Storage Efficiency = 1/3 (33%) redundant data
  • 30. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data?
  • 31. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? XOR: X Y X ⊕ Y 0 0 0 0 1 1 1 0 1 1 1 0
  • 32. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? XOR: X Y X ⊕ Y 0 0 0 0 1 1 1 0 1 1 1 0 Y = 0 ⊕ 1 = 1
  • 33. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? XOR: Data Durability = 1 X Y X ⊕ Y 0 0 0 0 1 1 1 0 1 1 1 0 Y = 0 ⊕ 1 = 1
  • 34. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? XOR: Data Durability = 1 useful data redundant data X Y X ⊕ Y 0 0 0 0 1 1 1 0 1 1 1 0
  • 35. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? XOR: Data Durability = 1 Storage Efficiency = 2/3 (67%) useful data redundant data X Y X ⊕ Y 0 0 0 0 1 1 1 0 1 1 1 0
  • 36. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Reed-Solomon (RS):
  • 37. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Reed-Solomon (RS):
  • 38. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Reed-Solomon (RS): Data Durability = 2 Storage Efficiency = 4/6 (67%)
  • 39. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Reed-Solomon (RS): Data Durability = 2 Storage Efficiency = 4/6 (67%) Very flexible!
  • 40. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data?
  • 41. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency
  • 42. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica
  • 43. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0
  • 44. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100%
  • 45. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication
  • 46. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2
  • 47. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33%
  • 48. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells
  • 49. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1
  • 50. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86%
  • 51. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3)
  • 52. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3) 3
  • 53. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3) 3 67%
  • 54. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3) 3 67% RS (10,4)
  • 55. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3) 3 67% RS (10,4) 4
  • 56. Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3) 3 67% RS (10,4) 4 71%
  • 57. EC in Distributed Storage Block Layout: 128~256MFile 0~128M … 640~768M0~128M 128~256M
  • 58. EC in Distributed Storage Block Layout: 128~256MFile … 640~768M 0~128 M block0 DataNode 0 0~128M 128~256M
  • 59. EC in Distributed Storage Block Layout: File … 640~768M 0~128 M block0 DataNode 0 128~ 256M block1 DataNode 1 0~128M 128~256M
  • 60. EC in Distributed Storage Block Layout: File … 640~768M 0~128 M block0 DataNode 0 128~ 256M block1 DataNode 1 0~128M 128~256M … 640~ 768M block5 DataNode 5
  • 61. EC in Distributed Storage Block Layout: File … 640~768M 0~128 M block0 DataNode 0 128~ 256M block1 DataNode 1 0~128M 128~256M … 640~ 768M block5 DataNode 5 DataNode 6 … parity
  • 62. EC in Distributed Storage Block Layout: File … 640~768M 0~128 M block0 DataNode 0 128~ 256M block1 DataNode 1 0~128M 128~256M … 640~ 768M block5 DataNode 5 DataNode 6 … parity Contiguous Layout:
  • 63. EC in Distributed Storage Block Layout: Data Locality ! File … 640~768M 0~128 M block0 DataNode 0 128~ 256M block1 DataNode 1 0~128M 128~256M … 640~ 768M block5 DataNode 5 DataNode 6 … parity Contiguous Layout:
  • 64. EC in Distributed Storage Block Layout: Data Locality ! Small Files " File … 640~768M 0~128 M block0 DataNode 0 128~ 256M block1 DataNode 1 0~128M 128~256M … 640~ 768M block5 DataNode 5 DataNode 6 … parity Contiguous Layout:
  • 65. EC in Distributed Storage Block Layout: File block0 DataNode 0 block1 DataNode 1 … block5 DataNode 5 DataNode 6 … parity 0~128M 128~256M
  • 66. EC in Distributed Storage Block Layout: File block0 DataNode 0 block1 DataNode 1 … block5 DataNode 5 DataNode 6 … parity 0~1M 1~2M 5~6M 0~128M 128~256M
  • 67. EC in Distributed Storage Block Layout: File block0 DataNode 0 block1 DataNode 1 … block5 DataNode 5 DataNode 6 … parity 0~1M 1~2M 5~6M 6~7M 0~128M 128~256M
  • 68. EC in Distributed Storage Block Layout: File block0 DataNode 0 block1 DataNode 1 … block5 DataNode 5 DataNode 6 … parity Striped Layout: 0~1M 1~2M 5~6M 6~7M Data Locality " Small Files ! Parallel I/O ! 0~128M 128~256M
  • 69. EC in Distributed Storage Spectrum: Replication Erasure Coding Striping Contiguous Ceph Ceph Quancast File System Quancast File System HDFS Facebook f4 Windows Azure
  • 70. Roadmap § Background of EC - Redundancy Theory - EC in Distributed Storage Systems § HDFS-EC architecture - Choosing Block Layout - NameNode — Generalizing the Block Concept - Client — Parallel I/O - DataNode — Background Reconstruction § Hardware-accelerated Codec Framework
  • 71. Choosing Block Layout • Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group)
  • 72. Choosing Block Layout • Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group) 64.61% 9.33% 26.06% 1.85%1.86% 96.29% small medium large file count space usage Top 2% files occupy ~65% space Cluster A Profile
  • 73. Choosing Block Layout • Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group) 64.61% 9.33% 26.06% 1.85%1.86% 96.29% small medium large file count space usage Top 2% files occupy ~65% space Cluster A Profile 40.08% 36.03% 23.89% 2.03% 11.38% 86.59% file count space usage Top 2% files occupy ~40% space small medium large Cluster B Profile
  • 74. Choosing Block Layout • Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group) 64.61% 9.33% 26.06% 1.85%1.86% 96.29% small medium large file count space usage Top 2% files occupy ~65% space Cluster A Profile 40.08% 36.03% 23.89% 2.03% 11.38% 86.59% file count space usage Top 2% files occupy ~40% space small medium large Cluster B Profile 3.20% 20.75% 76.05% 0.00%0.36% 99.64% file count space usage Dominated by small files small medium large Cluster C Profile
  • 77. Generalizing Block NameNode Mapping Logical and Storage Blocks
  • 78. Generalizing Block NameNode Mapping Logical and Storage Blocks Too Many Storage Blocks?
  • 79. Generalizing Block NameNode Mapping Logical and Storage Blocks Too Many Storage Blocks? Hierarchical Naming Protocol:
  • 80. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode
  • 81. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode
  • 82. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode
  • 83. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode
  • 84. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode Coordinator
  • 85. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode Coordinator
  • 86. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode Coordinator
  • 87. Client Parallel Writing streamer queue streamer … streamer DataNode DataNode DataNode Coordinator
  • 88. Client Parallel Reading … DataNodeDataNode DataNode DataNode DataNode
  • 89. Client Parallel Reading … DataNodeDataNode DataNode DataNode DataNode
  • 90. Client Parallel Reading … DataNodeDataNode DataNode DataNode DataNode
  • 91. Client Parallel Reading … DataNodeDataNode DataNode DataNode DataNode parity
  • 92. Reconstruction on DataNode § Important to avoid delay on the critical path - Especially if original data is lost § Integrated with Replication Monitor - Under-protected EC blocks scheduled together with under-replicated blocks - New priority algorithms § New ErasureCodingWorker component on DataNode
  • 93. Roadmap § Background of EC - Redundancy Theory - EC in Distributed Storage Systems § HDFS-EC architecture - Choosing Block Layout - NameNode — Generalizing the Block Concept - Client — Parallel I/O - DataNode — Background Reconstruction § Hardware-accelerated Codec Framework
  • 94. Acceleration with Intel ISA-L § 1 legacy coder - From Facebook’s HDFS-RAID project § 2 new coders - Pure Java — code improvement over HDFS-RAID - Native coder with Intel’s Intelligent Storage Acceleration Library (ISA-L)
  • 98. Conclusion § Erasure coding expands effective storage space by ~50%!
  • 99. Conclusion § Erasure coding expands effective storage space by ~50%! § HDFS-EC phase I implements erasure coding in striped block layout
  • 100. Conclusion § Erasure coding expands effective storage space by ~50%! § HDFS-EC phase I implements erasure coding in striped block layout § Upstream effort (HDFS-7285): - Design finalized Nov. 2014 - Development started Jan. 2015 - 218 commits, ~25k LoC change - Broad collaboration: Cloudera, Intel, Hortonworks, Huawei, Yahoo (Japan)
  • 101. Conclusion § Erasure coding expands effective storage space by ~50%! § HDFS-EC phase I implements erasure coding in striped block layout § Upstream effort (HDFS-7285): - Design finalized Nov. 2014 - Development started Jan. 2015 - 218 commits, ~25k LoC change - Broad collaboration: Cloudera, Intel, Hortonworks, Huawei, Yahoo (Japan) § Phase II will support contiguous block layout for better locality
  • 102. Acknowledgements § Cloudera - Andrew Wang, Aaron T. Myers, Colin McCabe, Todd Lipcon, Silvius Rus § Intel - Kai Zheng, Uma Maheswara Rao G, Vinayakumar B, Yi Liu, Weihua Jiang § Hortonworks - Jing Zhao, Tsz Wo Nicholas Sze § Huawei - Walter Su, Rakesh R, Xinwei Qin § Yahoo (Japan) - Gao Rui, Kai Sasaki, Takuya Fukudome, Hui Zheng
  • 103. Just merged to trunk!
  • 105. Questions? Just merged to trunk! Erasure Coding:A type of Error Correction Coding
  • 106. EC in Distributed Storage Spectrum:
  • 107. EC in Distributed Storage 0~128 M 128~256 M DataNode0 block0 block1 … DataNode1 640~768 M DataNode5 block5 Contiguous DataNode6 DataNode8 data parity … Block Layout: 128~256MFile 0~128M … 640~768M
  • 108. EC in Distributed Storage 0~128 M 128~256 M DataNode0 block0 block1 … DataNode1 640~768 M DataNode5 block5 Contiguous DataNode6 DataNode8 data parity … Block Layout: Data Locality ! 128~256MFile 0~128M … 640~768M
  • 109. EC in Distributed Storage 0~128 M 128~256 M DataNode0 block0 block1 … DataNode1 640~768 M DataNode5 block5 Contiguous DataNode6 DataNode8 data parity … Block Layout: Data Locality ! Small Files " 128~256MFile 0~128M … 640~768M
  • 110. EC in Distributed Storage 0~128 M 128~256 M DataNode0 block0 block1 … DataNode1 640~768 M DataNode5 block5 Contiguous DataNode6 DataNode8 data parity … Block Layout: Data Locality ! Small Files " 128~256MFile … 640~768M
  • 111. EC in Distributed Storage 0~1M … … 1~2M … … DataNode0 block0 DataNode1 5~6M … 127~128M DataNode5 Striping DataNode6 DataNode8 data parity …… Block Layout:
  • 112. EC in Distributed Storage 0~1M … … 1~2M … … DataNode0 block0 DataNode1 5~6M … 127~128M DataNode5 Striping DataNode6 DataNode8 data parity …… Block Layout: Data Locality "
  • 113. EC in Distributed Storage 0~1M … … 1~2M … … DataNode0 block0 DataNode1 5~6M … 127~128M DataNode5 Striping DataNode6 DataNode8 data parity …… Block Layout: Data Locality " Small Files !
  • 114. EC in Distributed Storage 0~1M … … 1~2M … … DataNode0 block0 DataNode1 5~6M … 127~128M DataNode5 Striping DataNode6 DataNode8 data parity …… Block Layout: Data Locality " Small Files ! Parallel I/O !
  • 115. Client Parallel Writing blockGroup DataStreamer 0 DataStreamer 1 DataStreamer 2 DataStreamer 3 DataStreamer 4 DFSStripedOutputStream dataQueue 0 dataQueue 1 dataQueue 2 dataQueue 3 dataQueue 4 blk_1009 blk_1010 blk_1011 blk_1012 blk_1013 Coordinator allocate new blockGroup
  • 116. Client Parallel Reading Stripe 0 Stripe 1 Stripe 2 DataNode 0 DataNode 1 DataNode 2 DataNode 2 DataNode 3 (parity blocks)(data blocks) all zero all zero requested requested requested requested requested recovery read recovery read recovery read recovery read recovery read recovery read recovery read recovery read