2. Scope
- Why do we need index xlator
- How does index xlator work?
- On fs layout of indices
- What types of indices exist already
- How do the consumers use indices
- Future improvements that can be done in index xlator
3. Why do we need index xlator
- 10 years back healing was happening by doing a full filesystem scan and
doing lookups on the files/directories
- This was extremely inefficient as most of the times the bricks would be down
for brief periods of time.
- It was not easy to identify the files/directories that need heal without crawling
the entire glusterfs filesystem.
- So we needed an index of files/directories that need to be healed
4. On fs layout of indices
- Indices are local to a brick and generally maintain status of files/directories
present only on that brick
- Each directory has .glusterfs/indices directory in which different types of
indices will be present
- Demo
5. On fs layout indices continued
- Each index will have a base index
- Any new index that needs to be indexed will become a hardlink to the base
index
- If the underlying filesystem imposes a maximum hardlink limit, then a new
base index will be created upon reaching this limit
6. What types of indices are present at the moment
- Afr uses xattrop based indices
- Dirty - Files/directories that are undergoing changes
- Xattrop - files/directories that need heal
- Entry-changes - granular file entries that need to be healed in a directory
- EC uses xattrop based indices
- Xattrop - files/directories that need heal and that are undergoing
changes.
7. How are indices consumed?
- Most of the current consumers are self-heal daemons from either afr/ec
modules
- Every X seconds, self-heal daemon will check its local brick indices to see if
any heal has to be performed by performing opendir/readdir on the indices
directory.
- Once it completes these heals, index will be removed.
8. How are indices consumed - Contd
- In readdirp afr will send inode attributes based on if the index has heals are
not.
9. Future improvements
- If the number of entries to heal increases by a lot(Of the order of millions),
performance of adding/removing to the index may take a while based on the
underlying File system’s implementation of directory.
- For example, ZFS doesn’t shrink directory sizes after removing files, which
may cause ls on the directory to be slow which can cause I/O also to be slow
in some cases. (https://github.com/gluster/glusterfs/issues/1764)
- One way to solve this problem is to bring in some sort of hierarchy to divide
up the number of entries into separate directories like we have in .glusterfs