SlideShare a Scribd company logo
1 of 23
DHT2 - O Brother,
Where Art Thou?
Shyamsundar Ranganathan
Developer
Session aims to explore...
"The hypothetical treasure at the end of the journey"
Why DHT2
"The plan..."
DHT2 design
"Known adventures along the way!"
Challenges in DHT2
"The strange characters"
Challenges because of DHT2
"Trouble escaping the chain gang!"
Where are we with DHT2
Loosely inspired by the movie: https://en.wikipedia.org/wiki/O_Brother,_Where_Art_Thou%3F
Why DHT2
DHT pitfalls
Directories on all subvolumes
Layout per directory
Rebalance IO path handling and nonoptimal data movement
This impacts scale and correctness!
Why DHT2
DHT pitfalls
Directories on all subvolumes
Layout per directory
Rebalance IO path handling and nonoptimal data movement
This impacts scale and correctness!
Correctness can be addressed in DHT,
Broader locking semantics for dentry operations
Possibly single layout adoption
But, increases complexity and could cost performance!
With DHT2 the goal is to fix all of the above, retaining or improving
performance
DHT2 Design: The file system
objects
View the file system as a collection of related objects
”wait a second... isn't that what inodes and data pointers are?”
Yes, but they are not distributed!
Directory objects denote hierarchy
storing <name,inode#> tables
File object maintains inode related metadata
Actual file data is maintained in data object(s)
The file system objects
(example)
Client View
. ('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Dir Object File Object
Data
Data
root
File2
Dir2Dir1
File1
The file system objects
(example)
inodes/dinode File data
1
A
CB
D
A
D
A Data Object
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
The different objects, segregated by type
Dir Object File Object
Data
Data
root
File2
Dir2Dir1
File1
The file system objects
(example)
inodes/dinode File data
1
A
CB
D
A
D
A Data Object
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Namespace hierarchy representation
Dir Object File Object
Data
Data
root
File2
Dir2Dir1
File1
The file system objects
(example)
inodes/dinode File data
1
A
CB
D
A
D
A Data Object
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Data association
DHT2 Design: Distribution
details
Distribute inodes using GFID
in the metadata ring
No hierarchy, a directory object lives only on one subvolume
Use GFID as the data object#
in the data ring
Distribution is hence not name dependent, and we just use a single layout per
ring
Dir Object File Object
BAC5
00EF
0001
BAC5
7525BA11
Distribution details
(example)
Metadata Ring
(few bricks)
Data Ring
(many bricks)
1
A
CB
D
A
D
Data Object
00EF
<File1, 00EF>
<Dir1, BA11>
<File2, BAC5>
<Dir2, 7525>
Switch names to GFID, add name to dinodes
Dir Object File Object
BAC5
00EF
0001
BAC5
7525BA11
Distribution details
(example)
Metadata Ring
(few bricks)
Data Ring
(many bricks)
1
A
CB
D
A
D
Data Object
00EF
<File1, 00EF>
<Dir1, BA11>
<File2, BAC5>
<Dir2, 7525>
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
DHT2 Design: Distribution
details (contd.)
Layout is based on bucket to subvolume assignment
Where, buckets >> subvolumes
Bucket ID is encoded into first n bytes of the GFID
Trivial GFID based operations
Collocates file object with parent object
File object# statically inherits parent directory# bucket ID
Optimized readirp and lookup operations (no hopping unless non-trivially
renamed, or a link file)
IOW, optimized (pGFID, basename) based operations
00EF
Dir Object File Object
BAC5
00EF
0001
BAC5
7525BA11
Distribution details
(example)
Metadata Ring
(few bricks)
Data Ring
(many bricks)
1
A
CB
D
A
D
Data Object
<File1, 00EF>
<Dir1, BA11>
<File2, BAC5>
<Dir2, 7525>
Bricks/Subvols
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Add bricks/subvolumes
00EF
Dir Object File Object
BAC5
00EF
0001
BAC5
7525BA11
Distribution details
(example)
Metadata Ring
(few bricks)
Data Ring
(many bricks)
1
A
CB
D
A
D
Data Object
<File1, 00EF>
<Dir1, BA11>
<File2, BAC5>
<Dir2, 7525>
Bricks/Subvols
00
75
BA
00
BA
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Buckets
Assign buckets to bricks
00EF
Dir Object File Object
BAC5
00EF
0001
BAC5
7525BA11
Distribution details
(example)
Metadata Ring
(few bricks)
Data Ring
(many bricks)
1
A
CB
D
A
D
Data Object
<File1, 00EF>
<Dir1, BA11>
<File2, BAC5>
<Dir2, 7525>
Bricks/Subvols
00
75
BA
00
BA
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Buckets
Place directories based on bucket encoded in the GFID
00EF
Dir Object File Object
BAC5
00EF
0001
BAC5
7525BA11
Distribution details
(example)
Metadata Ring
(few bricks)
Data Ring
(many bricks)
1
A
CB
D
A
D
Data Object
<File1, 00EF>
<Dir1, BA11>
<File2, BAC5>
<Dir2, 7525>
Bricks/Subvols
00
75
BA
00
BA
Client View
('root')
├── Dir1
│ ├── Dir2
│ └── File2
└── File1
Buckets
Colocate the files under a directory with the same bucket ID
DHT2 Design: Rebalance
Reassign buckets to/from newer/removed subvolumes
fix-layout is instantaneous
Files travel with directories (same bucket colocation)
Expand the cluster, but perform no rebalance
aka just add-brick and let min-free-disk+link-to do its job
This is the tough one, use layout versions/histories to pull this off?
Split DHT2 into client-server pieces
Handle IO traffic, locking during rebalance
Better consistency model for transactions
Ability to have different expansions strategies for the 2 rings
Challenges in DHT2
Rename ELOOP checking requires hierarchy
Object backpointers
Time and size information should be in sync between data and metadata
objects
Dirty inode, tracked via open fd
Orphan GFID cleanup
Enter transactions/journals!
Directories as files/in a DB
Reduce local FS inode proliferation
Challenges because of DHT2
IO path cannot depend on hierarchy (Ex: quota)
Quick-read cannot fetch data in lookups
Anon-fd based operations cannot track dirty inodes
Others
Will changelog play well!
EC has to bother with only data?
Tier may need a rethink
Sharding may accrue cost of missing anon-fd and data/meta-data split of
shards
Unknowns!
Where are we with DHT2
Introduced DHT Version 2 in Barcelona summit, 2015
Followed up with 2 discussions upstream on core concepts [1] [2]
Followed up with a POC and some slides/documents to demonstrate
the concepts [3]
In a limbo since then,
But, not out of the picture yet!
Targeting an experimental release with 4.0
Questions?
"The treasure you seek shall not be the
treasure you find."
References
[1] DHT2 Design Discussion
https://goo.gl/tLpqJO
[2] DHT2 Design Discussion, Round 2
https://goo.gl/dCAO36
[3] POC trail…
http://www.gluster.org/pipermail/gluster-devel/2015-August/046369.html
Other threads of interest:
- http://www.gluster.org/pipermail/gluster-devel/2016-March/048874.html
- http://www.gluster.org/pipermail/gluster-devel/2015-
November/047098.html
- http://www.gluster.org/pipermail/gluster-devel/2015-
September/046630.html

More Related Content

What's hot

State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
Accomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDAccomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDTyrone Systems
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBDdawnlua
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesJoseph Elwin Fernandes
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012Gluster.org
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013Udo Seidel
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordJAXLondon_Conference
 
Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Vijay Bellur
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 

What's hot (19)

State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Accomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDAccomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBD
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBD
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Gluster Data Tiering
Gluster Data TieringGluster Data Tiering
Gluster Data Tiering
 
Efficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using DatabasesEfficient data maintaince in GlusterFS using Databases
Efficient data maintaince in GlusterFS using Databases
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & Tricks
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
Hdfs internals
Hdfs internalsHdfs internals
Hdfs internals
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 

Viewers also liked

Règims totalitaris a l'Europa d'entreguerres (1919-1939)
Règims totalitaris a l'Europa d'entreguerres (1919-1939)Règims totalitaris a l'Europa d'entreguerres (1919-1939)
Règims totalitaris a l'Europa d'entreguerres (1919-1939)Eva María Gil
 
Pràctica 02 hmc curs 2015 16
Pràctica 02 hmc curs 2015 16Pràctica 02 hmc curs 2015 16
Pràctica 02 hmc curs 2015 16jordimanero
 
Curs 2014 15 pràctica 04 hmc
Curs 2014 15 pràctica 04 hmcCurs 2014 15 pràctica 04 hmc
Curs 2014 15 pràctica 04 hmcjordimanero
 
Pràctica 02 hmc curs 2014 15
Pràctica 02 hmc curs 2014 15Pràctica 02 hmc curs 2014 15
Pràctica 02 hmc curs 2014 15jordimanero
 
Pràctica 01 hmc 2015-16
Pràctica 01 hmc   2015-16Pràctica 01 hmc   2015-16
Pràctica 01 hmc 2015-16jordimanero
 
Pràctica 01 hmc
Pràctica 01 hmcPràctica 01 hmc
Pràctica 01 hmcjordimanero
 
2016 17 hmc pràctica 1
2016 17 hmc pràctica 12016 17 hmc pràctica 1
2016 17 hmc pràctica 1jordimanero
 
Curs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correcció
Curs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correccióCurs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correcció
Curs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correcciójordimanero
 
2016 17 hmc pràctica 2
2016 17 hmc pràctica 22016 17 hmc pràctica 2
2016 17 hmc pràctica 2jordimanero
 
Curs 2014 15 - pràctica 05 hmc - ii
Curs 2014 15 - pràctica 05 hmc - iiCurs 2014 15 - pràctica 05 hmc - ii
Curs 2014 15 - pràctica 05 hmc - iijordimanero
 
Curs 2016 17 pràctica 04 hmc
Curs 2016 17 pràctica 04 hmcCurs 2016 17 pràctica 04 hmc
Curs 2016 17 pràctica 04 hmcjordimanero
 
Curs 2015 16 - pràctica 05 hmc
Curs 2015 16 - pràctica 05 hmcCurs 2015 16 - pràctica 05 hmc
Curs 2015 16 - pràctica 05 hmcjordimanero
 

Viewers also liked (12)

Règims totalitaris a l'Europa d'entreguerres (1919-1939)
Règims totalitaris a l'Europa d'entreguerres (1919-1939)Règims totalitaris a l'Europa d'entreguerres (1919-1939)
Règims totalitaris a l'Europa d'entreguerres (1919-1939)
 
Pràctica 02 hmc curs 2015 16
Pràctica 02 hmc curs 2015 16Pràctica 02 hmc curs 2015 16
Pràctica 02 hmc curs 2015 16
 
Curs 2014 15 pràctica 04 hmc
Curs 2014 15 pràctica 04 hmcCurs 2014 15 pràctica 04 hmc
Curs 2014 15 pràctica 04 hmc
 
Pràctica 02 hmc curs 2014 15
Pràctica 02 hmc curs 2014 15Pràctica 02 hmc curs 2014 15
Pràctica 02 hmc curs 2014 15
 
Pràctica 01 hmc 2015-16
Pràctica 01 hmc   2015-16Pràctica 01 hmc   2015-16
Pràctica 01 hmc 2015-16
 
Pràctica 01 hmc
Pràctica 01 hmcPràctica 01 hmc
Pràctica 01 hmc
 
2016 17 hmc pràctica 1
2016 17 hmc pràctica 12016 17 hmc pràctica 1
2016 17 hmc pràctica 1
 
Curs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correcció
Curs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correccióCurs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correcció
Curs 2014 15 - pràctica 03 hmc - gran guerra rev russa - correcció
 
2016 17 hmc pràctica 2
2016 17 hmc pràctica 22016 17 hmc pràctica 2
2016 17 hmc pràctica 2
 
Curs 2014 15 - pràctica 05 hmc - ii
Curs 2014 15 - pràctica 05 hmc - iiCurs 2014 15 - pràctica 05 hmc - ii
Curs 2014 15 - pràctica 05 hmc - ii
 
Curs 2016 17 pràctica 04 hmc
Curs 2016 17 pràctica 04 hmcCurs 2016 17 pràctica 04 hmc
Curs 2016 17 pràctica 04 hmc
 
Curs 2015 16 - pràctica 05 hmc
Curs 2015 16 - pràctica 05 hmcCurs 2015 16 - pràctica 05 hmc
Curs 2015 16 - pràctica 05 hmc
 

Similar to DHT2 - O Brother, Where Art Thou with Shyam Ranganathan

IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanGluster.org
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Oracle 12c Multitenant architecture
Oracle 12c Multitenant architectureOracle 12c Multitenant architecture
Oracle 12c Multitenant architecturenaderattia
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxMiraj Godha
 
北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file systemCando Zhou
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and HadoopGirish L
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
An introduction to git
An introduction to gitAn introduction to git
An introduction to gitolberger
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmIRJET Journal
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilSunita Shrivastava
 

Similar to DHT2 - O Brother, Where Art Thou with Shyam Ranganathan (20)

IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
 
Hadoop
HadoopHadoop
Hadoop
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Oracle 12c Multitenant architecture
Oracle 12c Multitenant architectureOracle 12c Multitenant architecture
Oracle 12c Multitenant architecture
 
Demystifying git
Demystifying git Demystifying git
Demystifying git
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file system
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and Hadoop
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Git session-2012-2013
Git session-2012-2013Git session-2012-2013
Git session-2012-2013
 
HADOOP
HADOOPHADOOP
HADOOP
 
An introduction to git
An introduction to gitAn introduction to git
An introduction to git
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch Council
 

More from Gluster.org

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraGluster.org
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas SiravaraGluster.org
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David HassonGluster.org
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleGluster.org
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS Gluster.org
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster.org
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Gluster.org
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDOGluster.org
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible forGluster.org
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and KubernetesGluster.org
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Gluster.org
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster.org
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGluster.org
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Gluster.org
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us FailGluster.org
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster.org
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Gluster.org
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyGluster.org
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Gluster.org
 
Gluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster.org
 

More from Gluster.org (20)

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas Siravara
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook Scale
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible for
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal Madappa
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us Fail
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
 
Gluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud Applications
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

DHT2 - O Brother, Where Art Thou with Shyam Ranganathan

  • 1. DHT2 - O Brother, Where Art Thou? Shyamsundar Ranganathan Developer
  • 2. Session aims to explore... "The hypothetical treasure at the end of the journey" Why DHT2 "The plan..." DHT2 design "Known adventures along the way!" Challenges in DHT2 "The strange characters" Challenges because of DHT2 "Trouble escaping the chain gang!" Where are we with DHT2 Loosely inspired by the movie: https://en.wikipedia.org/wiki/O_Brother,_Where_Art_Thou%3F
  • 3. Why DHT2 DHT pitfalls Directories on all subvolumes Layout per directory Rebalance IO path handling and nonoptimal data movement This impacts scale and correctness!
  • 4. Why DHT2 DHT pitfalls Directories on all subvolumes Layout per directory Rebalance IO path handling and nonoptimal data movement This impacts scale and correctness! Correctness can be addressed in DHT, Broader locking semantics for dentry operations Possibly single layout adoption But, increases complexity and could cost performance! With DHT2 the goal is to fix all of the above, retaining or improving performance
  • 5. DHT2 Design: The file system objects View the file system as a collection of related objects ”wait a second... isn't that what inodes and data pointers are?” Yes, but they are not distributed! Directory objects denote hierarchy storing <name,inode#> tables File object maintains inode related metadata Actual file data is maintained in data object(s)
  • 6. The file system objects (example) Client View . ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1
  • 7. Dir Object File Object Data Data root File2 Dir2Dir1 File1 The file system objects (example) inodes/dinode File data 1 A CB D A D A Data Object Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 The different objects, segregated by type
  • 8. Dir Object File Object Data Data root File2 Dir2Dir1 File1 The file system objects (example) inodes/dinode File data 1 A CB D A D A Data Object Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 Namespace hierarchy representation
  • 9. Dir Object File Object Data Data root File2 Dir2Dir1 File1 The file system objects (example) inodes/dinode File data 1 A CB D A D A Data Object Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 Data association
  • 10. DHT2 Design: Distribution details Distribute inodes using GFID in the metadata ring No hierarchy, a directory object lives only on one subvolume Use GFID as the data object# in the data ring Distribution is hence not name dependent, and we just use a single layout per ring
  • 11. Dir Object File Object BAC5 00EF 0001 BAC5 7525BA11 Distribution details (example) Metadata Ring (few bricks) Data Ring (many bricks) 1 A CB D A D Data Object 00EF <File1, 00EF> <Dir1, BA11> <File2, BAC5> <Dir2, 7525> Switch names to GFID, add name to dinodes
  • 12. Dir Object File Object BAC5 00EF 0001 BAC5 7525BA11 Distribution details (example) Metadata Ring (few bricks) Data Ring (many bricks) 1 A CB D A D Data Object 00EF <File1, 00EF> <Dir1, BA11> <File2, BAC5> <Dir2, 7525> Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1
  • 13. DHT2 Design: Distribution details (contd.) Layout is based on bucket to subvolume assignment Where, buckets >> subvolumes Bucket ID is encoded into first n bytes of the GFID Trivial GFID based operations Collocates file object with parent object File object# statically inherits parent directory# bucket ID Optimized readirp and lookup operations (no hopping unless non-trivially renamed, or a link file) IOW, optimized (pGFID, basename) based operations
  • 14. 00EF Dir Object File Object BAC5 00EF 0001 BAC5 7525BA11 Distribution details (example) Metadata Ring (few bricks) Data Ring (many bricks) 1 A CB D A D Data Object <File1, 00EF> <Dir1, BA11> <File2, BAC5> <Dir2, 7525> Bricks/Subvols Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 Add bricks/subvolumes
  • 15. 00EF Dir Object File Object BAC5 00EF 0001 BAC5 7525BA11 Distribution details (example) Metadata Ring (few bricks) Data Ring (many bricks) 1 A CB D A D Data Object <File1, 00EF> <Dir1, BA11> <File2, BAC5> <Dir2, 7525> Bricks/Subvols 00 75 BA 00 BA Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 Buckets Assign buckets to bricks
  • 16. 00EF Dir Object File Object BAC5 00EF 0001 BAC5 7525BA11 Distribution details (example) Metadata Ring (few bricks) Data Ring (many bricks) 1 A CB D A D Data Object <File1, 00EF> <Dir1, BA11> <File2, BAC5> <Dir2, 7525> Bricks/Subvols 00 75 BA 00 BA Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 Buckets Place directories based on bucket encoded in the GFID
  • 17. 00EF Dir Object File Object BAC5 00EF 0001 BAC5 7525BA11 Distribution details (example) Metadata Ring (few bricks) Data Ring (many bricks) 1 A CB D A D Data Object <File1, 00EF> <Dir1, BA11> <File2, BAC5> <Dir2, 7525> Bricks/Subvols 00 75 BA 00 BA Client View ('root') ├── Dir1 │ ├── Dir2 │ └── File2 └── File1 Buckets Colocate the files under a directory with the same bucket ID
  • 18. DHT2 Design: Rebalance Reassign buckets to/from newer/removed subvolumes fix-layout is instantaneous Files travel with directories (same bucket colocation) Expand the cluster, but perform no rebalance aka just add-brick and let min-free-disk+link-to do its job This is the tough one, use layout versions/histories to pull this off? Split DHT2 into client-server pieces Handle IO traffic, locking during rebalance Better consistency model for transactions Ability to have different expansions strategies for the 2 rings
  • 19. Challenges in DHT2 Rename ELOOP checking requires hierarchy Object backpointers Time and size information should be in sync between data and metadata objects Dirty inode, tracked via open fd Orphan GFID cleanup Enter transactions/journals! Directories as files/in a DB Reduce local FS inode proliferation
  • 20. Challenges because of DHT2 IO path cannot depend on hierarchy (Ex: quota) Quick-read cannot fetch data in lookups Anon-fd based operations cannot track dirty inodes Others Will changelog play well! EC has to bother with only data? Tier may need a rethink Sharding may accrue cost of missing anon-fd and data/meta-data split of shards Unknowns!
  • 21. Where are we with DHT2 Introduced DHT Version 2 in Barcelona summit, 2015 Followed up with 2 discussions upstream on core concepts [1] [2] Followed up with a POC and some slides/documents to demonstrate the concepts [3] In a limbo since then, But, not out of the picture yet! Targeting an experimental release with 4.0
  • 22. Questions? "The treasure you seek shall not be the treasure you find."
  • 23. References [1] DHT2 Design Discussion https://goo.gl/tLpqJO [2] DHT2 Design Discussion, Round 2 https://goo.gl/dCAO36 [3] POC trail… http://www.gluster.org/pipermail/gluster-devel/2015-August/046369.html Other threads of interest: - http://www.gluster.org/pipermail/gluster-devel/2016-March/048874.html - http://www.gluster.org/pipermail/gluster-devel/2015- November/047098.html - http://www.gluster.org/pipermail/gluster-devel/2015- September/046630.html