Rhel cluster gfs_improveperformance

How to Improve GFS/GFS2 File System Performance
and Prevent Processes from Hanging
Author: John Ruemker, Shane Bradley, and Steven Whitehouse
Editor: Allison Pranger
02/04/2009, 10/12/2010

OVERVIEW
Cluster file systems such as the Red Hat Global File System (GFS) and Red Hat Global File System 2
(GFS2) are complex systems that allows multiple computers (nodes) to simultaneously share the same
storage device in a cluster.
There can be many reasons why performance does not match expectations. In some workloads or
environments, the overhead associated with distributed locking on GFS/GFS2 file systems might affect
performance or cause certain commands to appear to hang. This document addresses common problems
and how to avoid them, how to discover if a particular file system is affected by a problem, and how to know
if you have found a real bug (rather than just a performance issue).
This document is for users in the design stage of a cluster who want to know how to get the best from a
GFS/GFS2 file system, as well as for for users of GFS/GFS2 file systems who need to track down a
performance problem in the system.
NOTE: This document provides recommended values only. Values should be thoroughly tested before
implementing in a production environment. Under some workloads, they might have a negative impact on the
performance of GFS/GFS2 file systems.

Environment
• Red Hat Enterprise Linux 4 and later

Terminology
This document assumes a basic knowledge and understanding of file systems in general. The following
subsections briefly discuss relevant terminology.
Inodes and Resource Groups
In the framework of GFS/GFS2 file systems, inodes correspond to file-system objects like files, directories,
and symlinks.
A resource group corresponds to the way GFS and GFS2 keep track of areas within the file system. Each
resource group contains a number of file system blocks, and there are bitmaps associated with each
resource group that determine whether each block of that resource group is free, allocated for data, or
allocated for an inode. Since the file system is shared, the resource group information/bitmaps and inode
information must be kept synchronized between nodes so the file system remains consistent (not corrupted)
on all nodes of the cluster.

How to Improve GFS/GFS2 File System Performance | Ruemker, Bradley, and Whitehouse 1

Glocks
A glock (pronounced “gee-lock”) is a cluster-wide GFS lock. GFS/GFS2 file systems use glocks to
coordinate locking of file system resources such as inodes and resource groups. The glock subsystem
provides a cache-management function that is implemented using DLM as the underlying communication
layer.
Holders
When a process is using a GFS/GFS2 file-system resource, it locks the glock associated with that resource
and is said to be holding that glock. Each glock can have a number of holders that each lay claim on that
resource. Processes waiting for a glock are considered to be waiting to hold the glock, and they also have
holders attached to the glock, but in a waiting state.

THEORY OF OPERATION
Both GFS and GFS2 work like local file systems, except in regards to caching. In GFS/GFS2, caching is
controlled by glocks.
There are two essential things to know about caching in order to understand GFS/GFS2 performance
characteristics. The first is that the cache is split between nodes: either only a single node may cache a
particular part of the file system at one time, or, in the case of a particular part of the file system being read
but not modified, multiple nodes may cache the same part of the file system simultaneously. Caching
granularity is per inode or per resource group so that each object is associated with a glock (types 2 and 3,
respectively) that controls its caching.
The second thing to note is that there is no other form of communication between GFS/GFS2 nodes in the
file system. All cache-control information comes from the glock layer and the underlying lock manager
(DLM). When a node makes an exclusive-use access request (for a write or modification operation) to locally
cache some part of the file system that is currently in use elsewhere in the cluster, all the other cluster nodes
must write any pending changes and empty their caches. If a write or modification operation has just been
performed on another node, this requires both log flushing and writing back of data, which can be
tremendously slower than accessing data that is already cached locally.
These caching principles apply to directories as well as to files. Adding or removing a directory entry is the
same (from a caching point of view) as writing to a file, and reading the directory or looking up a single entry
is the same as reading a file. The speed is slower if the file or directory is larger, although it also depends on
how much of the file or directory needs to be read in order to complete the operation.
Reading cached data can be very fast. In GFS2, the code path used to read cached data is almost identical
to that used by the ext3 file system: the read path goes directly to the page cache in order to check the page
state and copy the data to the application. There will only be a call into the file system to refresh the pages if
the pages are non-existent or not up to date. GFS works slightly differently: it wraps the read call in a glock
directly; however, reading data that is already cached this way is still fast. You can read the same data at the
same speed in parallel across multiple nodes, and the effective transfer rate can be very large.
It is generally possible to achieve acceptable performance for most applications by being careful about how
files are accessed. Simply taking an application designed to run on a single node and moving it to a cluster
rarely improves performance. For further advice, contact Red Hat Support.

FILE-SYSTEM DESIGN CONSIDERATIONS
Before putting a clustered file system into production, you should spend some time designing the file system
to allow for the least amount of contention between nodes in the cluster. Since access to file-system blocks
are controlled by glocks that potentially require inter-node communications, you will get the best
performance if you design your file system to avoid contention.

File/Directory Contention
If, for example, you have dozens of nodes that all mount the same GFS2 file system and all access the
same file, then access will only be fast if all nodes have read-only access (nodes mounted with the
noatime mount option). As soon as there is one writer to the shared file, the performance will slow down
dramatically. If the application knows when it has written a file that will not be used again on the local node,
then calling fsync and then fadvise/madvise with the DONT_NEED flag will help to speed up access from
other cluster nodes.
The other important item to note is that for directories, file create/unlink activity has the same effect as a
write to a regular file: it requires exclusive access to the directory to perform the operation and then
subsequent access from other nodes requires rereading the directory information into cache, which can be a
slow operation for large directories. It is usually better to split up directories with a lot of write activity into
several subdirectories that are indexed by a hash or some similar system in order to reduce the amount of
times each individual directory has to be reread from disk.

Resource-Group Contention
GFS/GFS2 file systems are logically divided into several areas known as resource groups. The size of the
resource groups can be controlled by the mkfs.gfs/mkfs.gfs2 command (-r parameter). The
GFS/GFS2 mkfs program attempts to estimate an optimal size for your resource groups, but it might not be
precise enough for optimal performance. If you have too many resource groups, the nodes in your cluster
might waste unnecessary time searching through tens of thousands of resource groups trying to find one
suitable for block allocation. On the other hand, if you have too few resource groups, each will cover a larger
area, so block allocations might suffer from the opposite problem: too much time wasted in glock contention
waiting for available resource groups. You might want to experiment with different resource-group sizes to
find one that optimizes system performance.

Block-Size Considerations
When the file system is formatted with the mkfs.gfs/mkfs.gfs2 command, you may specify a block size
with -b. If no size is specified, the default is 4K. Different block sizes will often provide different performance
characteristics for your application. Most hardware is designed to operate efficiently with the default block
size of 4K.
Using the default 4k block size is recommended for all file systems. However, if there is a requirement for
efficient storage of very small files, 1k should be considered the minimum block size (-b 1024). Ideal block
size depends on how the file system is used. You might want to experiment with different block sizes to find
one that optimizes system performance.


MOUNT OPTIONS
Unless atime support is essential, Red Hat recommends setting noatime on every GFS/GFS2 mount
point. This will significantly improve performance since it prevents reads from turning into writes. With GFS2
on Red Hat Enterprise Linux 6 and later, there is also the option of relatime, which updates atime when
other timestamps are being updated; however, noatime is still recommended.
Do not use the journaled data mode (chattr +j) unless it is required. The default ordered-data mode will
prevent any uninitialized data from appearing in files after a crash. To ensure data is actually on disk, use
fsync(2) for both the parent directory and newly created files.

ANSWERS TO COMMON QUESTIONS
If your cluster is running slowly or appears to be stopped and you are not sure why, the steps below should
help to resolve the issue. Remember that Red Hat Support is always available to help.

First Steps
Begin by collecting the answers to several simple questions that Red Hat Support will ask when you contact
them:
• What is the workload on the cluster?
• What applications are running, and are they using large or small files?
• How are the files organized?
• What is the architecture of the cluster?
• How many nodes?
• What is the storage, and how is it attached?
• How large is the file system(s)?
• What are the timing constraints?
• Does the issue always occur at a certain time of day or have some relationship with a particular
event (for example, nightly backups)?
• Is the problem a performance problem (slow), a real bug (completely stuck, kernel panic, file-system
assertion), or a corruption issue (usually indicated by a file-system withdraw)?
• Is the problem reproducible, or did it happen only once?
• Does the problem occur on a single node or in a cluster situation?
• Does the problem occur on the same node, or does it move around?
Of course, not every situation will require the same set of information, but the answers to these questions
will give you a good idea of where to start looking for the root of the problem.
If the problem always occurs at a specific time, look for periodic processes that might be running (not all are
in crontab, but that is a good place to begin).

Is a Task Stuck or Just Slow?
It is often difficult to tell if a task is completely stuck or if it is just slow, but there are signs that point to poor
performance resulting from contention for glocks. One example is increased network traffic. A significant
amount of DLM network traffic indicates that there is a lot of locking activity and thus potentially a lot of

cache invalidation. The function of a lock depends on each individual situation, and locking should be
assessed as a proportion of the total network bandwidth instead of measured against specific metrics. Also,
increased locking activity is only an indication of a problem and not a guarantee that one exists. The actual
level of locking activity is highly dependent on the workload.
Information from glock dumps can be used to show whether tasks are still making progress. Take two glock
dumps, spaced apart by a few seconds or a few minutes, and then look for glocks with a lot of waiting
holders (ignoring granted holders). If the same glocks have exactly the same list of waiting holders in the
second glock dump as they did in the first, it is an indication that the cluster has become stuck. If the list has
changed at all, then the cluster is just running slowly.
Sometimes taking a glock dump can take a long time due to the amount of data involved, which depends on
the number of cached glocks (GFS/GFS2 keep a large number of glocks in memory for performance
purposes). The time needed will change depending on the total memory size of the node in question and the
amount of GFS/GFS2 activity that has taken place on that node.
GFS2 tracepoints (available in Red Hat Enterprise Linux 6 and later) can also be used to monitor activity in
order to see whether any nodes are stuck.

Which Task is Involved in the Slowdown?
The glock dump file includes details of the tasks that have requested each glock (the owner of each holder).
This information can be used to find out which task is stuck or involved in contention for a glock.

Which Inode is Contended?
Glock numbers are made up of two parts. In the glock dump, glock numbers are represented as type,
number. Type 2 indicates an inode, and type 3 indicates a resource group. There are additional types of
glocks, but the majority of slowdown and glock-contention issues will be associated with these two glock
types.
The number of the type 2 glocks (inode glocks) indicates the disk location (in file system blocks) of the inode
and also serves as the inode identification number. You can convert the inode number listed in the glock
dump from hexadecimal to decimal format and use it to track down the inode associated with that glock.
Identifying the contended inode should be possible using find -inum, preferably on an otherwise idle file
system since it will try to read all the inodes in the file system, making any contention problem worse.

Why Does gfs2_quotad Appear Stuck When I'm Not Even Using Quotas?
The gfs2_quotad process performs two jobs. One of those is related to quotas, and the other is updating
the statfs information for the file system. If a problem occurs elsewhere in the file system, gfs2_quotad
often becomes stuck since the periodic writes to update the statfs information can become queued behind
other operations in the system. If gfs2_quotad appears stuck, it is usually a symptom of a different
problem elsewhere in the file system.

Is It Worth Trying to Reproduce a Problem While Only a Single Node is Mounted?
In almost every case, you should try to reproduce a problem while only a single node is mounted. If the

problem does reproduce on a single node, it is probably not related to clustering at all. If the problem only
appears in the cluster, it indicates either an I/O issue or a contention issue on one or more inodes in the
cluster.

How Can I Calculate the Maximum Throughput of GFS/GFS2 on My Hardware?
Maximum throughput depends on the I/O pattern from the application, the I/O scheduler on each node, the
distribution of I/O among the nodes, and the characteristics of the hardware itself.
One simple example involves two nodes, each performing streaming writes to its own file on a GFS2 file
system. This scenario can be simulated at the block-device level by creating two streams of I/O to different
parts of the shared block device using dd. This test will allow you to measure the absolute maximum
performance that the hardware can sustain. Actual file-system performance will differ due to the overhead of
block allocation and file-system metadata, but this test will provide an upper limit.
In this example, we are assuming that the block device is a single, shared, rotational hard disk that is able to
write to only a single sector at once. The two streams of I/O will be sent to the disk by the I/O schedulers on
the two different nodes, each without any knowledge of the other. The disk must perform scheduling in order
to move the disk head between the two areas of the disk receiving the streams. This process will be slow,
and it might even be slower than writing the two streams of I/O sequentially from a single node.
If, on the other hand, we assume that the block device in the example is a RAID array with many spindles,
then the two streams of I/O may be written at the same time without having to move disk heads between the
two areas. This will improve performance.
Storage hardware must be specified according to the expected I/O patterns from the nodes so that it can be
capable of delivering the level of performance that will support file-system requirements.

What About I/O Barriers?
Beginning with Red Hat Enterprise Linux 6, GFS2 uses I/O barriers by default when flushing the log. Red
Hat recommends the use of I/O barriers for most block devices; however, barriers are not required in all
cases and can sometimes be detrimental to performance, depending on how the storage device implements
them. If the shared block device has no write cache or if the write cache is not volatile (for example, if it is
powered from a UPS or similar device), then you might wish to turn off barrier support.
You can prevent the use of I/O barriers by setting the nobarrier option with the mount command (or in
/etc/fstab). If the underlying storage does not support them, I/O barriers will automatically be turned off,
indicated by a log message and the appearance of the nobarrier option in /proc/mounts as if it had
been set using the command line.
GFS2 only issues a single barrier each time it flushes the log. The total number of barriers issued over time
can be minimized by reducing the number of operations that result in a log flush (for example, fsync(2) or
glock contention) or (depending on workload) increasing the journal size. This has potential side-effects, so
you should attempt to strike a balance between performance and the potential for data loss in the event of a
node failure.


Can I Use Discard Requests for Thin-Provisioning?
On Red Hat Enterprise Linux 6 and later, GFS2 supports the generation of discard requests. These requests
allow the file system to tell the underlying block device which blocks are no longer required due to
deallocation of a file or directory. Sending a discard request implies an I/O barrier.
There is a small performance penalty when these requests are generated by GFS2, and the performance
penalty might be larger depending on how the underlying storage device interprets the requests. In order to
increase performance and decrease overhead, GFS2 saves up as many requests as possible and merges
them into a single request whenever it is able.
In order to turn on this feature, you need to set the discard option with the mount command. This feature
will only work when both the volume manager and the underlying storage device support the requests.
Some storage devices might consider the requests a suggestion rather than a requirement (for example, if
the file system requests a discard of a single block, but the underlying storage is only able to discard larger
chunks of storage). Other storage devices might not deallocate any storage at all, but they might use the
suggestion to implement a secure delete by zeroing out the selected blocks.

Are there Benefits to Using Solid-State Storage with GFS/GFS2?
In general, solid-state storage has a much lower seek time, which can improve overall system performance,
particularly where glock contention is a major factor.

Is Network Traffic a Major Factor in GFS/GFS2 Performance?
Network traffic should not greatly affect overall system performance, provided that there is enough
bandwidth for the cluster infrastructure to keep quorum and synchronize any POSIX locks, that multicast is
working between all the nodes, and that DLM is able to communicate its lock requests.
However, if a network device is shared between storage and/or application traffic as well as cluster traffic,
Red Hat recommends using tc to implement suitable bandwidth limits. The use of jumbograms is not
recommended unless the network is carrying storage traffic. Latency is generally regarded as more
important than overall throughput in terms of GFS/GFS2 performance.
Watching traffic levels to ensure that the links do not saturate is a sensible policy since that can be a
warning sign of other issues (such as contention), but traffic statistics otherwise do not provide a great deal
of useful information.

What if I Have a Different Problem?
Red Hat Support representatives are available to help if you cannot find the answer to your question here.
If you have experienced a kernel Oops, assertion, or similar problem, contact Red Hat Support immediately.
If you have experienced a file-system withdraw, it is almost always due to corruption of the file system, and
fsck.gfs/fsck.gfs2 can usually fix the problem. Unmount the file system on all nodes, take a backup,
and then run fsck on the file system. Keep any output from fsck, as it will be needed in the event that
fsck cannot fix the problem. Contact Red Hat Support if fsck fails to resolve the problem.


APPLICATIONS
Email (imap/sendmail/etc.)
Locality of access often causes problems when email is run on clustered file systems. To optimize
performance, Red Hat recommends arranging for users to have a "home" node to which each user connects
by default, assuming that all the cluster nodes are working normally. If a problem occurs in the cluster, users
can then be moved to another home node where all of their files are cached. This reduces cross-node
invalidation.
It is also possible to use the same technique on the delivery side by setting up MTA to forward to the user's
home node and letting that node write to the file system. If that node isn't available, then MTA can write the
message directly. Using maildir instead of mbox also helps scalability since each message, rather than
the whole mailbox, has its own lock. When performance issues occur in maildir setups, they are almost
always the result of contention on the directory lock.

Backup
Backup can affect performance since the process of backing up a node or set of nodes usually involves
reading the entire file system in sequence. If a single node performs the backup, that node will retain all that
information in cache until other nodes in the cluster start requesting locks.
While running this type of backup program while the cluster is in operation is a sure way to reduce system
performance, there are a number of ways to reduce the detrimental effect. One way is to drop the caches
using echo -n 3 >/proc/sys/vm/drop_caches after the backup has completed. This reduces the
time required by other nodes to get their glocks/caches back. However, this method is not ideal because the
other nodes will have stopped caching the data that they were caching before the backup process started.
Also, there is an effect on the overall cluster performance while the backup is running, which is often not
acceptable.
Another method is to back up at the block-device level and take a snapshot. This is currently only supported
in cases where there is provision for a snapshot at the hardware (storage array) level.
A better solution is to back up the working set of each cluster node from the node itself. This distributes the
workload of the backup across the cluster and keeps the working set in cache on the nodes in question. This
often requires custom scripting.

Web Servers
GFS/GFS2 file systems are ideally suited as web servers since serving web pages tends to involve a large
amount of data that can be cached on all nodes. Issues can arise when data has to be updated, but you can
reduce the potential for contention by preparing a new copy of the website and switching over rather than
trying to update the files in place. Red Hat recommends making the root of the website a bind mount and
using mount --move to have the web server(s) use a new set of files.


SYSTEM CALLS
Read/Write
Read/write performance should be acceptable for most applications, provided you are careful not to cause
too many cross-node accesses that require cache sync and/or invalidation.
Streaming writes on GFS2 are currently slower than on GFS. This is a direct consequence of the locking
hierarchy and results from GFS2 performing writes on a per-page basis like other (local) file systems. Each
page written has a certain amount of overhead. Due to the different lock ordering, GFS does not suffer from
the same problem since it is able to perform the overhead operations once for multiple pages.
Speed of multiple-page write calls aside, there are many advantages to the GFS2 file system, including
faster performance for cached reads and simpler code for deadlock avoidance during complicated write calls
(for example, when the source page being written is from a memory-mapped file on a different file system
type). Red Hat is currently working to allow multiple-page writes, which will make GFS2’s streaming write
calls faster than the equivalent GFS operation.
This streaming-writes issue is the only known regression in speed between GFS and GFS2. Smaller writes
(page sized and below) on GFS2 are faster than on GFS.

Memory Mapping
GFS and GFS2 implement memory mapping differently. In GFS (and some earlier GFS2 kernels), a page
fault on a writable shared mapping would always result in an exclusive lock being taken for the inode in
question. This is consequence of an optimization that was originally introduced for local file systems where
pages would be made writable on the initial page fault in order to avoid a potential second fault later (if the
first access was a read and a subsequent access was a write).
In Red Hat Enterprise Linux 6 (and some later versions of Red Hat Enterprise Linux 5) kernels, GFS2 has
implemented a system of only providing a read-only mapping for read requests, significantly improving
scalability. A file that is mapped on multiple nodes of a GFS2 cluster in a shared writable manner can be
cached on all nodes, provided no writes occur.
NOTE: While in theory you can use the feature of shared writable memory mapping on a single shared file to
implement distributed shared memory, any such implementation would be very slow due to cache-bouncing
issues. This is not recommended. Sharing a read-only shared file in this way is acceptable, but only on recent
kernels with GFS2. If you need to share a file in this way on GFS, then open and map it read only to avoid
locking issues.

Cache Control (fsync/fadvise/madvise)
Both GFS and GFS2 support fsync(2), which functions the same way as in any local file system.
When using fsync(2) with numerous small files, Red Hat recommends sorting them by inode number. This
will improve performance by reducing the disk seeks required to complete the operation. If it is possible to
save up fsync(2) on a set of files and sync them all back together, it will help performance when
compared with using either O_SYNC or fsync(2) after each write.
To improve performance with GFS2, you can use the fadvise and/or madvise pair of system calls to
request read ahead or cache flushing when it is known that data will not be used again (GFS does not

support the fadvise/madvise interface). Overall performance can be significantly improved by flushing the
page cache for an inode when it will not be used from a particular node again and is likely to be requested
by another node.
It is also possible to drop caches globally. Using echo -n 3 >/proc/sys/vm/drop_caches will drop
the caches for all file systems on a node and not just GFS/GFS2. This can be useful when you have a
problem that might be caused by caching and you want to run a cache cold test, for example. However, it
should not be used in the normal course of operation (see the Backup section above).

File Locking
The locking methods below are only recommendations as GFS and GFS2 do not support mandatory locks.
flock
The flock system call is implemented by type 6 glocks and works across the cluster in the normal way. It is
affected by the localflocks mount option, as described below. Flocks are a relatively fast method of file
locking and are preferred to fcntl locks on performance grounds (the difference becomes greater on
clusters with larger node counts).
fcntl (POSIX Locks)
POSIX locks have been supported on a single-node basis since the early days of GFS. The ability to use
these locks in a clustered environment was added in Red Hat Enterprise Linux 5. Unlike the other
GFS/GFS2 locking implementations, POSIX locks do not use DLM and are instead performed in user space
via corosync. By default, POSIX locks are rate limited to a maximum of 100 locks per second in order to
conserve network bandwidth that might otherwise be flooded with POSIX-lock requests. To raise the limit,
you can edit the cluster.conf file (setting the limit to 0 removes it altogether).
NOTE: Some applications using POSIX locks might try to use F_GETLK fcntl to try to obtain the PID of a
blocking process. This will work on GFS/GFS2, but due to clustering, the process might not be on the same
node as the process that used F_GETLK. Sending a signal is not as straightforward in this case, and care
should be taken not to send signals to the wrong processes.

It is possible to make use of POSIX locks on a single-node basis by setting the localflocks mount option.
This also affects flock(2), but it is not usually a problem since it is unusual to require both forms of locking
for a single application.
NOTE: Localflocks must be set for all NFS-exported GFS2 file systems, and the only supported NFS-overGFS/GFS2 solutions are those with only a single active NFS server at a time designed for active/passive
failover. NFS is not currently supported in combination with either Samba or local applications.

Due to the user-space implementation of POSIX locks, they are not suitable for high-performance locking
requirements.
Leases
Leases are not supported on either GFS or GFS2.


DLM
There is no reason why applications should not make use of the DLM directly. Interfaces are available, and
details can be found in the DLM documentation.

RECOMMENDED TUNABLE SETTINGS
The following sections describe recommended values for GFS tunable parameters.

glock_purge
In Red Hat Enterprise Linux 4.6/5.1 and later, a GFS tunable parameter, glock_purge, has been added to
reduce the total number of locks cached for a particular file system on a cluster node.
NOTE: This setting does not exist in Red Hat Enterprise Linux 6 or later, and it is not a recommended solution
to any problem for which there is another solution. In Red Hat Enterprise Linux 6 and later, this parameter is self
tuning, and caching can be controlled from the userspace via the fsync/fadvise system calls as described
earlier in this document.

This tunable parameter defines the percentage of unused glocks for a file system to clear every five
seconds, as shown below, where X is an integer between 0 and 100 indicating the percentage to clear.
# gfs_tool settune /path/to/mount glock_purge X
A setting of 0 disables glock_purge. This value is typically set somewhere between 30-60 to start and can
be further tuned based on testing and performance benchmarks. This setting is not persistent, so it must be
reapplied every time the file system is mounted. It is typically placed in /etc/rc.local or
/etc/init.d/gfs in the start function on every node so that it is applied at boot time after the file systems
are mounted.

demote_secs
Another tunable parameter, demote_secs, can be used in conjunction with glock_purge. This tunable
parameter demotes GFS write locks into less restricted states and subsequently flushes the cache data into
disk. Shorter demote second(s) can be used to avoid accumulation of too much cached data, resulting in
burst-mode flushing activities and prolonging another node’s lock access.
NOTE: This setting does not exist in Red Hat Enterprise Linux 6 or later, and it is not a recommended solution
to any problem for which there is another solution. In Red Hat Enterprise Linux 6 and later, this parameter is self
tuning, and caching can be controlled from the userspace via the fsync/fadvise system calls as described
earlier in this document.

The default value is 300 seconds. To enable the demoting every 200 seconds on mount point /mnt/gfs1,
enter the following command:
$ gfs_tool settune /mnt/gfs1 demote_secs 200
To set back to default of 300 seconds, enter the following command:
$ gfs_tool settune /mnt/gfs1 demote_secs 300
Note that this setting only applies to an individual file system, so multiple commands must be used to apply it

to more than one mount point.

statfs_fast
The statfs_fast tunable parameter can be used in Red Hat Enterprise Linux 4.5 or later to speed up
statfs calls on GFS.
NOTE: For Red Hat Enterprise Linux 6 and later, this can be set via the mount command line using the
statfs_quantum and statfs_percent mount arguments. This is the preferred method since it is then set at
mount time and does not require a separate tool to change it.

To enable statfs_fast, enter the following command:
# gfs_tool settune /path/to/mount statfs_fast 1
Red Hat recommends the use of mount options noquota, noatime, and nodiratime for GFS file
systems, if possible, as they are known to improve performance in many cases. They can be added in
/etc/fstab, as shown below.
/dev/clustervg/lv1

/mnt/appdata

gfs defaults,noquota,noatime,nodiratime 0 0

NOTE: An issue has been reported to Red Hat Engineering regarding the usage of noquota in Red Hat
Enterprise Linux 5.3: Why do I get a mount error reporting 'Invalid argument' on my GFS or GFS2 file system on
Red Hat Enterprise Linux 5.3?.

Disabling ls Colors
It might also be beneficial to remove the aliases for the ls command that cause it to display colors in its
output when using the bash or csh shells. By default, Red Hat Enterprise Linux systems are configured with
the following aliases from /etc/profile.d/colorls.sh and colorls.csh:
# alias | grep 'ls'
alias l.='ls -d .* --color=tty'
alias ll='ls -l --color=tty'
alias ls='ls --color=tty'
In situations where a GFS file system is slow to respond, the first response of many users is to run ls in
order to determine the problem. If the --color option is enabled, ls must run a stat() against every
entry, which creates additional lock requests and can create contention for those files with other processes.
This might exacerbate the problem and cause slower response times for processes accessing that file
system. To prevent ls from using the --color=tty option for all users, the following lines can be added to
the end of /etc/profile:
alias ll='ls -l' 2>/dev/null
alias l.='ls -d .*' 2>/dev/null
unalias ls
These lines can also be placed in a user’s ~/.bash_profile to disable --color=tty on an individual
basis.
In general, however, it is best to avoid excessive use of the ls command due to the locking overhead.
Copyright © 2011 Red Hat, Inc. “Red Hat,” Red Hat Linux, the Red Hat “Shadowman” logo, and the products
listed are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Linux® is the registered
trademark of Linus Torvalds in the U.S. and other countries.

www.redhat.com

Rhel cluster gfs_improveperformance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rhel cluster gfs_improveperformance

Similar to Rhel cluster gfs_improveperformance (20)

More from sprdd

More from sprdd (20)

Recently uploaded

Recently uploaded (20)

Rhel cluster gfs_improveperformance