HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic


Published on

HP-UX 11i LVM Mirroring Features and Multi-threads

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

  1. 1. Dusan Baljevic dusan.baljevic@ieee.org a) LVM recover mirror consistency uses two methods: MWC (Mirror Write Cache) MCR (Mirror Consistency Record) MCR and MWC are methods of keeping mirrors in synch and tracking writes to disk. MCR is kept on the disks, in Volume Group Restricted Area (VGRA). MWC is kept in core memory. MWC/MCR is permanently running with the MWC in memory communicating with the MCR on disk. This can have an effect on performance. Also it is used because of quick recovery from a crash. b) The purpose of the mirror write consistency cache (MWC) is to provide a list of possibly out of sync mirrored areas. When a volume group is activated, the LVM copies all areas with an entry in the MWC from one of the good copies to all the other copies.This ensures that the mirrors are consistent, but makes no claims about the quality of the data. c) On each write request to a mirrored logical volume that requests MWC, the LVM checks to see if there is already an entry for the data area in the current MWC. If so, it just sends the write to the underlying device driver. If there isn't an entry, it gets one and then waits for the now updated MWC to be written to disk. So, each write to one of these logical volumes will potentially introduce one extra serial disk access. Whether or not this occurs is dependent on the degree to which accesses are random. The more random, the higher probability of missing the MWC! d) Getting an MWC entry can involve waiting for one to be available.If all the MWC entries are currently being used by I/O in progress, a given request might have to wait in a queue of requests until an entry becomes available. Notice that the MWC entry is never freed on disk when a request returns to the LVM, it is merely marked as available to be used by another outgoing request. e) Whether or not you use the MWC will depend on which aspect of system performance is more important to your environment: run-time or recovery-time You can disable MWC to improve run-time performance. Entire data space will be resynched after a crash. This may be done when a database is doing transaction logging for itself.
  2. 2. f) You can disable both MCR and MWC only if the application can maintain mirror consistency itself (for example, database)!Mirrors will not be resynched by LVM after a crash. MWC disabled gives better I/O performance. If MCR is also disabled the mirrors will not synch at reboot.It will be up to you to decide if they want these features in use or not. With MCR enabled (that is the default), the LVM will not keep run-time records of modified extents as MWC does, but in the event of a crash (followed by reboot and re-activation), the LVM will copy all extents from one non-stale copy of the mirror to all other mirrored copies of that extent.This is similar to the synchronization strategy used by DataPair/UX.The "good" copy of the data is chosen arbitrarily from the non-stale extents as there is no record kept as to which disk has the most recent copy of the data, so if a mirrored write is in progress during a crash, it is possible that old data could be copied over new data during the mirrored recovery at activation time. If this behavior is unacceptable, MWC should be chosen. For example, this behavior would be preferred in situations where a database will re-write all incomplete transactions after a crash, but relies on the file system as underlying structures: the consistent mirrors will allow fsck to cleanly fix the file system, after which the database can update any of its out-of-date data files. g) If both mirrors are enabled, I/O is redirected to another mirror if one is busy - so it improves performance. This should balance the I/O cost of MWC. The cost of disabling MWC and MCR is a slower recovery after a crash. h) In HP-UX 11.31, the MWC is larger in size than in previous releases.This leads to a better logical volume I/O performance by allowing more concurrent writes. MWC has also been enhanced to support large I/O sizes. i) Logical volumes belonging to shared volume groups (those activated with "vgchange –a s") of LVM version 1.0 and 2.0 must have the consistency recovery set to NOMWC or NONE. Versions 1.0 and 2.0 do not support MWC for logical volumes belonging to shared volume groups. This might have changed with some patches, but I did not check this yet... With the September 2008 release of HP-UX 11i v3, LVM supports MWC for logical volumes belonging to LVM version 2.1 shared volume groups. This ensures faster recovery following a system crash. j) Note that one cannot change MWC on an active logical volume.Here is an example for primary paging device (swap): Problem:While attempting to disable the "Mirror Write Cache" and "Mirror Consistency"for primary swap (/dev/vg00/lvol2 ) which was mirrored, the following errormessage is shown: The command used to modify logical volumes, /sbin/lvchange, has failed. The stderroutput from the command is shown below. The logical volume has not been modified.
  3. 3. lvchange: Could not change MirrorWriteCache while Logical Volume is opened orbeing synchronized. Solution:Since primary swap is activated when the system boots, even in single user mode, theonly way to successfully use lvchange on the primary swap logical volume is fromLVM maintenance mode. To boot into LVM maintenance mode, reboot the machine and interrupt the boot sequence. >hpux -lm (PA-RISC) Or >boot -lm (IA64) This will boot the machine into LVM maintenance mode. Use lvchange(1M) with the "-M" and "c"options to modify the mirror write cache and consistency settings. # lvchange -M n -c n /dev/vg00/lvol2 k) A quick check of the system's lvol configurations will show if this parameter is misconfigured. Assuming we are interested in vg00: # lvdisplay /dev/vg00/lvol* | more Look (or grep) for the lines which describe each lvol's "Consistency Recovery": Consistency Recovery MWC Consistency Recovery NOMWC Consistency Recovery NONE If the "Consistency Recovery" is set to NONE for anything other than a swap device (or a raw database volume as stated above), it will need to be changed.Note that if the lvol is not currently mirrored, this is not an issue, and can safely be ignored until the customer wants to mirror that lvol. It doesn't hurt to change the parameter early, and it could prevent stumbling later if they forget about this problem by the time they go to mirroring. l) If we need to change the MWC for logical volume that is already mirrored, the process is a little bit more complex. After determining which mirrored logical volumes need to have their consistency recovery changed, the steps to take are: reduce the mirror to only one good copy (non-mirrored), change the consistency recovery parameter, then recreate the mirroring configuration. The simplest way to reduce a mirroring configuration to one without mirroring is to use "lvreduce -m 0" to simply eliminate the mirror copies. Then use thelvchange(1M) to turn on consistency followed by lvextend(1M) to re-add the mirrors.This reduction will minimize downtime, as it can safely be done while the system is fully operational, but it has two drawbacks:
  4. 4. It allows the user less control over which copy of themirror will remain, and it may require more reconstructionto recreate any specialized mirroring configuration such as striped extents. Although the logical volume can remain in-use during the operation, it would be best to avoid using the logical volume until integrity checks can be made on the data (). Another way of getting to a non-mirrored state is to split-off the mirrored copies using lvsplit(1M). m) If importing a volume group from a previous release of HP-UX, there will be a full resynchronization because the format of the MWC changed at HP-UX 11i v3. If the volume group contains mirrored logical volumes using MWC, LVM converts the MWC at import time. It also performs a complete resynchronization of all mirrored logical volumes, which can take substantial time. n) Now, let's list some of typical rules for MWC: Disable MWC and set MCR to "none" for the database logical volume because the database logging mechanism already provides consistency recovery. Disable MWC and MCR on mirrored logical volumes where the data is not needed after a crash, such as paging device (swap space) or other raw scratch data. Logical volumes containing database data or file systems with few or infrequently written large files (greater than 256K) must not use the MWC when runtime performance is more important than crash recovery time. Use fast disks for the most intensive applications if they use mirrored logical volumes. Ensure that all physical volumes for mirrored logical volumes are active because MWC and other I/O will be redirected to another mirror if one is busy so it improves performance. Spread the data space across as many physical volumes as possible. The number of volume groups is directly related to the MWC. Since there is only one MWC per volume group, disk space that is used for many small random write requests mustbe kept in distinct volume groups if possible when the MWC is used. If possible, ensure that physical volumes in volume groups that contain mirrored Logical volumes reside on different controllers. For example, in a system with several disk devices on each card and several cards on each bus converter, create volume groupsso that all disks off of one bus converter are in one group and all the disks on the other are in another group (one way is via physical volume groups). This configuration ensures that all mirrors are created with devices accessed through different I/O paths.
  5. 5. Since mirroring is typically used for root volume group only (these days all other data is on SAN), it is strongly recommended not to allow any third-party applications or software to run in it. I go to such an extreme that I even force customers to use their own areas for temporary files: 1. Set TMPDIR variable to point to some other non-boot-volume. I always encourage application admins to use their own areas for temporary files. Some applications look at TMPDIR environment variable. Others look at two other variables: Try setting TEMP and TMP as well as TMPDIR. 2. Mount /tmp file system with "tmplog" option in /etc/fstab. /tmp is DESIGNED for temporary files, so it should not be abused for other choices. In "tmplog" mode, the intent log is almost always delayed. This improves performance, but recent changes may disappear if the system crashes. 3. Clean /tmp cleaned up at boot time (not really a performance issue but usefulfor maintenance, especially if number of temporary files keep growing)? By defaultI always enable it in /etc/rc.config.d/clean_tmps CLEAR_TMP=1 Final comment is about multi-thread synching the mirror in LVM on HP-UX. Option 1 lvsync(1M) recognizes the following option: -T Perform mirror synchronization of logical volumes within a volume group using multiple parallel threads. Logical volumes belonging to different volume groups will be synchronized serially. It is possible that logical volumes start and/or complete their synchronization in a different order than specified on the command line. The maximum number of threads used can be controlled using the PTHREAD_THREADS_MAX system tunable. NOTE: This option has no effect if the volume group is activated in shared mode. For example, you can extend the logical volumes and then issue parallel threads: # lvextend -m 1 -s /dev/vgapp/lvol1 # lvextend -m 1 -s /dev/vgapp/lvol2 # lvextend -m 1 -s /dev/vgapp/lvol3
  6. 6. # lvsync -T /dev/vgapp/lvol1 /dev/vgapp/lvol2 /dev/vgapp/lvol3 Option 2 Check the defragmentation on the file system which is linked tothe logical volumes you need to mirror. For example # fsadm -F vxfs -DEde -t 600 /mydata … and take action if necessary. Another advice is to do it on the weekends, when activityby the users decreases. Note the following on HP-UX 11.31: # getconf PTHREAD_THREADS_MAX 3002 # kctune -v max_thread_proc Tunable max_thread_proc Description Maximum number of threads in each process Module pm_proc Current Value 3002 Value at Next Boot 3002 Value at Last Boot 3002 Default Value 256 Constraints max_thread_proc>= 64 max_thread_proc<= nkthread Can Change Immediately or at Next Boot