VSAM Tuning
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

VSAM Tuning

on

  • 9,985 views

Describes when (rarely) and how to tune your VSAM dataset. Also describes how to design a VSAM dataset to allow excellent performance with little tuning.

Describes when (rarely) and how to tune your VSAM dataset. Also describes how to design a VSAM dataset to allow excellent performance with little tuning.

Statistics

Views

Total Views
9,985
Views on SlideShare
9,984
Embed Views
1

Actions

Likes
3
Downloads
303
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

VSAM Tuning Document Transcript

  • 1. Tuning Your VSAM Files For Optimal Performance By Dan O'Dea April 30, 2003 ∆ΦΟ
  • 2. 2 VSAM Record Blocking Concepts VSAM uses a double blocking system. First, user data records are collected into appropriately sized logical blocks called Control Intervals (CI). The CIs are then broken down to physical blocks. These physical blocks are always a multiple of 512 bytes. For example, a file with 200-byte records could be put into a VSAM file with a CI size of 2,048 bytes, using about 97% of the data space. The data is then written as a single block of 2,048 bytes. The process works in reverse during a read. VSAM handles all blocking and deblocking of records, totally hidden from the user program. In the appendix is a list of allowable CI sizes, their physical record sizes, and other information on how VSAM stores data on 3390 DASD. The user should pick a CI size matching the data record size and application processing needs. VSAM selects the correct physical record size to maximize the use of DASD space for that CI size. When a CI is read, VSAM uses an I/O chaining process to make sure the entire CI is read in one I/O. VSAM may thus read more than one CI of data as if it were in the same I/O. Loading and Maintaining an ESDS During initial loading of an ESDS, CIs are loaded as full as possible, from left to right through the CI, in the physical order they are received. Any unused bytes are filled with binary zeroes. There is no free space in the file except after the last record. VSAM adds new records at the end of the file in physical sequence. Records in the file may be updated as long as its length doesn't change. The changed record is replaced where it came from. If records are deleted, the user program is responsible to mark the records and process them as deleted. VSAM does not delete records from an ESDS.
  • 3. 3 Loading and Maintaining a KSDS: Free Space During the initial load of a KSDS, CIs are loaded from left to right in ascending key sequence. For this reason, you must sort the records into ascending key order before loading them. If the file has a defined free space, records are loaded up to, but not beyond, the free space high water mark in each CI. If the FSPC parameter includes CA free space, empty CIs are left at the bottom of the CA. You might ask how VSAM determines the high water mark for free space. Let's assume you have a CI size of 4096 bytes and all of your records are 200 bytes long. If you had no free space, the 4K CI could hold 4096/200, or 20, records (since you can't store part of a record, always truncate the decimals). Let's give the file 20% free space for future growth. During the initial load, VSAM guarantees each CI has at least 20% of its space left empty. To find out how many records are loaded into each CI, first multiply the CI size by the FREESPACE percentage. Then subtract that number from the CI size to get the number of bytes VSAM lets you use. The final step is to divide the usable bytes by the record length to find out how many records actually fit into the CI. Don't forget to truncate. So, given: CISZ 4096, RECSZ (200 200), FSPC(20 10): 4096 bytes * 20% = 819.2, or 819 bytes free per CI; 4096 bytes - 819 bytes = 3,277 bytes usable; 3277 bytes / 200 bytes = 16.385, so 16 records of 200 bytes fit into a CI; 16 records * 200 bytes = 3,200 bytes; 4096 bytes - 3,200 bytes = 896 total bytes of free space. See how VSAM reserved more bytes than you might expect? We'll discuss this in more detail later. For now, there are two important points you need to know about free space: 1. Free space gives you room to add records to the file after its initial load; 2. Free space is observed only during initial load or resume load.
  • 4. 4 Free Space Usage During Normal File Activity After the initial load, all the following options are possible within the reserved free space. COBOL verbs are used in the descriptions. Updating existing records. VSAM searches the index to locate the data CI containing the record, and then searches the records in the CI from left to right for an equal key match. When the record is located, it is returned to the user program to be updated. A REWRITE puts the record back into the file. If the key is changed as part of the update, and the changed key is new to the file, a WRITE is needed; a REWRITE will fail. If the new key already exists, a REWRITE works only if the file was opened as RANDOM or DYNAMIC (specified in the SELECT clause). Existing records made longer. If the updated record is longer than it was before, it can be rewritten back to the file, as long as the new length is not greater than the file’s maximum record length. When REWRITE is issued, VSAM compares the change in length to the amount of free space remaining in the CI. If there is enough free space to put the record back where it was, any records to the right of the updated record are moved to the right by enough places to make room for the longer record. The CI is then written back to the file. If there is not enough room for the longer record, VSAM does a CI split. We'll discuss the CI split later. Existing records made shorter. If the program makes the record shorter, a REWRITE is issued without restrictions. VSAM determines the amount of change in the record's length, and then rewrites the record. All records to the right of the updated record are shifted left to reclaim the bytes no longer used by the updated record. That combines all free space at the end of the CI. Deleting an existing record. If a program needs to delete a record, the key must be known and specified in a DELETE call. COBOL issues a READ to find the record. VSAM moves all records to the right of the deleted record to the left by a number of bytes equal in the record to be deleted. That combines all free space at the end of the CI. If the record being deleted is the last record in the CI, the CI is reclaimed as a free CI. It can then be used as a candidate for a CI split.
  • 5. 5 Usage of Free Space during Normal File Activity (cont'd) Inserting a new record. The program must issue a WRITE statement. VSAM first performs a READ to locate the correct CI for the new record. When the CI is located, it's read into a data buffer, where VSAM searches it for record keys to determine the "point of insertion" for the new record. Remember, all records must be in ascending key sequence. Once VSAM finds the point of insertion, it compares the number of free space bytes in the CI to the length of the record. If there is enough space to contain the record, VSAM moves all the existing records to the right of the point of insertion toward the end of the CI, and then inserts the new record. If there is not enough space, VSAM schedules a CI split. If the record to be inserted has a key greater than that of the record at the end of a given CI but less than that of the record at the beginning of the next CI, VSAM sometimes inserts the new record at the end of one CI but other times at the beginning of the next CI because of key compression in the index.
  • 6. 6 Index Structure of a KSDS A VSAM KSDS has two components. One holds the data; the other is an index used to locate data records in the data component. Here is a graphic showing the structure of an index. This is not the actual physical model, but rather a logical point of view. In other words, the figure shows how VSAM searches for a data record. Within the data component, CIs are grouped into a block called a control area (CA). The CA provides a handy logical unit to index. Above each CA (logically speaking), the index begins with a group of records called the sequence set. Each sequence set record points to a CI within the CA below it. The sequence set is also called "index level 1". There are an equal number of CIs and sequence set records in the CA. For example, if the CA contains 90 CIs, the sequence set contains 90 sequence set records. Each index entry holds the high key in the rightmost record of the indexed CI and a pointer (CI number) to that CI. On the next page is a conceptual diagram of how that works.
  • 7. 7 KDSD Indexing (cont'd) If the data component has more than one CA (and thus more than one sequence set record), there must be another level above the sequence set. This higher level contains key/pointer entries just like Level 1, but references the sequence set rather than the data component. In other words, on level 2 the index records contain the highest key in the corresponding sequence set record, and a CI pointer to the index CI containing that sequence set record. If there are large numbers of sequence set records, one level 2 record is not large enough to hold all necessary entries. In that case, there must be more than one Level 2 record and one, new index set record pointing to each Level 2 record. This new level is, of course, Level 3. All index levels above the first (the sequence set) are collectively called the index set. This structure pyramids upward until there is only one index record at the top level.
  • 8. 8 Retrieving Records from a KSDS When your program makes a READ call to a KSDS, VSAM first reads the highest-level index record into an index buffer. The search key is compared against the entry keys within the index set record from left to right, looking for the first key higher than the search key. When that key is found, the RBA pointer associated with it is used to get the appropriate record from the next lowest index level. That continues through the index set until a sequence set record is reached. The search through the sequence set works the same way as the index set search. When a hit occurs, the RBA of the data component CI is returned to VSAM. The RBA is used to get the CI where the record should reside. When the CI is read into the data component buffers, VSAM deblocks the CI and return the requested record to the program. VSAM CI/CA Split Processing For a long time IBM had the arrogant philosophy "My way or the highway," resulting in abends when your job violated arbitrary standards. For example, you can only have 16 extents on a non-VSAM file, with a resulting B-37 abend when you try to use more. In VSAM, the new philosophy seems to be "The user is always right" and extra effort is spent to accomplish the I/O request. A KSDS's distributed free space allows for additions of new records or changes to existing records. When the room runs out, some method of adding more room must be used to prevent interruptions to the user program. In the old ISAM overflow method, inserts of records happened immediately, but that caused reads to slow down drastically after just a few inserts. That's backward, considering you need to read a record far more often than you need to insert it. In VSAM, a new concept called "cellular splitting" was introduced. VSAM tries to split a CI to make room for a new record. If that fails, a CA split is done followed by a CI split. There are two types of splitting strategies, Normal Insert Strategy (NIS) and Sequential Insert Strategy (SIS). As you might expect, NIS is used by default. SIS must be specifically coded for, and is beneficial when mass sequential inserts are requested.
  • 9. 9 KSDS CI and CA Splitting Concepts: The CI split When a record insert cannot be completed due to lack of free space, VSAM tries to split the CI to make room. In NIS CI splits, VSAM: • Turns on the "busy bit" in the CI and writes it back to the file. That prevents another program modifying the CI while the split is happening; • Checks the entries in the sequence set record for this CA to locate a free CI; • If there is a free CI in this CA: • A new CI is created in the buffer. There is no need to read the free CI because it contains only binary 0s; • VSAM finds the mid-point of the CI to be split at the nearest record boundary. All records to the right of this point are moved to the new CI. Both CIs now have approximately 50% free space; • The insert is performed as a normal insert into the old CI. This includes turning off the "busy bit" on the CI being updated; • The CA's sequence set record is updated to show the old CI's new highest key and the new CI now having data in it; • The new CI is written into its old location; • The old CI is written into its old location. • If there is not a free CI in the same CA, a CA split must occur. VSAM performs a CA split, and then returns to perform the CI split. An example of this would be helpful.
  • 10. 10 KSDS CI and CA Splitting Concepts: The CI split (cont'd) Figure 2.10 shows the original content of the file. Say an update program is run containing a record insertion with the key 10776. Using a key search VSAM finds the destination for the record is CI #02. The point of insertion is between keys 10775 and 10777. VSAM checks the free space at the end of the CI. Since there is none, a CI split is required. A sequence set search is done, and VSAM determines CI#08 is free and can accept a split.
  • 11. 11 KSDS CI and CA Splitting Concepts: The CI split (cont'd) The midpoint of the CI is found to be between records 10762 and 10764. All records to the right of record key 10762 are moved to the new version of CI #08 in the buffer. Since there is room in the new CI for the record key 10776 between 10775 and 10777, the insert takes place at the time CI #08 is filled. Figure 2.11 shows the file after the CI has been split but before the sequence set record has been updated.
  • 12. 12 KSDS CI and CA Splitting Concepts: The CI split Finally, the sequence set record is updated to show the new high keys for each CI and to show the logical sequence of keys. Note how CI #08's sequence set record now resides between records for CIs #02 and #03. That shows the logical key sequence should the file be read sequentially.
  • 13. 13 KSDS CI and CA Splitting Concepts: the CA split In NIS mode, a CA split only occurs during a CI split if VSAM finds the CA does not have any free CIs. NIS CA splits work like this: • A free CA must be found. VSAM checks the catalog's HURBA and HARBA. If the HURBA is less than the HARBA, VSAM knows there is at least one free CA between the end of the data (HURBA) and the end of the file's present allocation extent (HARBA). The first CA beyond the HURBA is selected as the destination for a CA split; • The CIs in the bottom half of the old CA must be located for movement into the new CA. Because it is necessary to keep the data in logical sequence, the bottom logical half of the CI, not the bottom physical half, is moved. VSAM locates the midpoint of the CA's sequence set and moves all CIs beyond that point into the new CA; • VSAM then begins moving the CIs from the old CA to the new one. This is a long process. First, a read I/O is required each time an old CI is moved if it is not already in a data buffer. Second, a write I/O is required each time the set of data buffers is full. Third, since (more than likely) the target CI is quite far away in the DASD from the source CI, there is a large time spent on disk head seek time. For example, on a 4K CI size there are 180 CIs per CA, half of which need to move. Assuming the IBM defaults of two data buffers, this means 180 I/O (1 read/write pair per CI) plus 90 disk seeks in each direction between source and target CAs. At a seek time of 9.5ms and an I/O time of (about) 3.9ms for a 4K CI, that adds up to over 1.5 seconds to perform a CA split's data moves. Ouch! • Once the data has been moved, the sequence set of the source CI is updated to show the CIs moved are now free. A sequence set record is built above the new CI to describe it as well; • Each sequence set record has a horizontal chain pointer to identify the next CA in ascending key sequence. The source sequence set record is given a new chain pointer to the new target sequence set record. That new record's chain pointer now points to the source's original "next CA". The sequence set records are now written back to the index; • Since a new sequence set record has been created on index level 1, level 2 must be updated to reflect its addition. An entry is built showing the highest key in the new CI. That entry is then inserted between the entries for the original CI and the one following it. The sequence set records are moved to the right to make room for the insert. Often there is room enough for this to happen. If there is not, an index CI split occurs to create a new level 2 index set record. It should be noted, this process occurs repeatedly all the way up to the highest level of the index and must be considered as part of the cost of a CA split.
  • 14. 14 VSAM Control Information In a non-VSAM environment, a file is either fixed blocked (FB) or variably blocked (VB). For an FB file, the OS needs only the record length to block or deblock the data. On a VBA file, however, there must be some way for the OS to determine how to block and deblock the file, because each record could have a different length. Therefore, in a VB file, each record and block is preceded by a length field of four bytes. For example, consider a fixed record length of 251 bytes. To contain 10 records the BLKSIZE of the file would be 10*251, or 2510. For a file with variable record lengths, the maximum of which is 251, each record must be described to the OS as 255 (251 plus the 4-byte record length prefix). To contain the same 10 records the BLKSIZE would be 10*255+4, or 2554. A VSAM file has a double blocking structure. Each VSAM file has a "block" of data records of fixed length called a control interval (CI). Inside the CI, records can be variable in length. How does VSAM know how long each record is? VSAM uses two fields in each CI to describe record lengths: the Control Interval Definition Field (CIDF) and the Record Descriptor Field (RDF). In each CI there is one CIDF of four bytes and at least two RDFs of three bytes each. How the CIDF and RDFs are used is detailed in the cited VSAM reference. It is useful to know the CIDF is used as the "software end-of-file" mark, also called the "end-of-data" mark. The first CI at the top of the first unused CA is filled with binary zeroes and is considered end-of-data. The key thing to remember here is the length of these fields. If your file has fixed length records you must subtract 10 from each potential CI size when dividing by the record size. For variable length records it is necessary to know how the records vary.
  • 15. 15 VSAM Control Information: Spanned Records If the SPANNED attribute has been specified in the VSAM file's definition, records longer than the CI size may cross CI boundaries. Each portion of a record residing in a spanned CI is called a segment. Several rules need to be understood: • For a record to span, its record size must be specified in the define with a maximum record size larger than the CI size; • Records spanning CIs cannot share CIs with non-spanning records; • The first byte of the segment starts in byte 1 of the CI and uses all the bytes in the CI, minus the control fields. The only exception is the last segment if it is shorter than the CI size; • FOR A KSDS: • The key must be in the first segment; • Free space is not reserved in a spanned CI; • CI splits do not occur in spanned CIs. • A spanned record cannot span across a CA boundary; • To ensure integrity, segments each contain an update field in the RDF of their CI. Upon an update request, VSAM checks these fields to make sure the entire record is valid. If the update field is not the same in all segments, VSAM cancels the update; • Each CI with a spanned record contains a CIDF and two RDFs. The first (i. e. rightmost) RDF contains the length and number of the segment. The second RDF contains the update number. In general, it's not a good idea to use spanned records. This topic is discussed in more detail shortly.
  • 16. 16 VSAM Data Structure Concepts: the Cluster Before we get into the cluster concept of VSAM, let's review the basics of non-VSAM file storage. A new, non-VSAM file is physically allocated at job step initiation using the JCL DD statement. DISP=NEW causes the JCL to pass the space and unit parameters to DADSM. A volume is located and DADSM reads through the format 5 DSCB records in the VTOC (or the Space Map record if the VTOC is indexed) to find the first contiguous span of unallocated tracks able to satisfy the allocate request. If there is not enough contiguous space, DADSM tries to find enough total space within five extents. If the space exists, DADSM writes a format 1 DSCB to mark the tracks selected as allocated. This DSCB contains such items as the DSN, creation date, available DCB information, etc. That allows the OS to locate and use the file later. Since VSAM was intended to replace all other access methods, and the VTOC is a non-VSAM file accessed through keys, it was necessary to replace the VTOC with a new structure, processed and managed using VSAM. That structure is called the VSAM catalog. Since it also must be a VSAM file and be accessed by keys (DSN), the VSAM catalog is a VSAM KSDS. In 1983 the VSAM catalog structure was replaced by the ICFCAT structure. ICF catalogs are discussed later in this document. However, since the purpose of this discussion is to show how VSAM file structure works and why it's designed the way it is, we’ll continue to refer to the VSAM catalog for now. When a VSAM file is allocated using AMS' DEFINE command, the VSAM catalog is requested to act as a VTOC. Therefore it must somehow mark tracks as allocated for each of the physical file components. A name is assigned to each component, either given in the DEFINE command or generated by AMS. A record (like the VTOC DSCBs) is written for each component to the catalog. These records contain all the information the OS needs to access the file. For normal access, component information is not enough. In all programs there is file definition language to access one file at a time using one DD statement. Specifying the DSN of either the index or data component would allow access to that component, but without any connection to the other component. The allocated component could be read as an ESDS, but in the case of a KSDS that's probably not what you'd want. To be completely specific, the VTOC does not contain association records to link individual components.
  • 17. 17 VSAM Data Structure Concepts: the Cluster (cont'd) All VSAM files have an additional catalog entry to describe associations between related components. This entry is called the cluster. Granted the cluster is only required for a KSDS or a variable-length RRDS file, but consistency needs force the cluster define to be necessary for all VSAM files. By this convention, a program can access an entire logical file with all its structures by allocating only one DSN. Let's look at a typical VSAM define. Note the convention of allocating the data component as ‘cluster.DATA’ and the index component as ‘cluster.INDEX’. This is a convention only and is not required. DEFINE - CLUSTER - (NAME(B5926VSM.G5925.VXVRELGA.KSDS) - STORAGECLASS(SCVSAMDB) MANAGEMENTCLASS(MCDBASE) - INDEXED SPEED SHAREOPTIONS(2 3) - RECORDSIZE(146 146) - KEYS(21 0) - FREESPACE(0 0)) - DATA - (NAME(B5926VSM.G5925.VXVRELGA.KSDS.DATA) - VOLUMES(* * * * * * *) - CISZ(4096) - CYLINDERS(275 20)) - INDEX - (NAME(B5926VSM.G5925.VXVRELGA.KSDS.INDEX) - VOLUMES(* *) - CISZ(2048) - TRACKS(1 1)) Because the cluster record is the association between components, values given in the cluster definition are assigned to all components of the file unless specifically given in a particular component section. For example, if the CISZ (CI size) arguments were left off the data and index component definitions and a CISZ (2048) argument were given in the cluster define, both components would be given a CI size of 2048. If space is provided under the cluster section and not included with the other components, VSAM reserves the minimum space it needs for the index and allocate the rest of the space to the data component.
  • 18. 18 VSAM Data Structure Concepts: The Cluster (cont'd) Let's take a quick look at some of the define parameters. 1. DEFINE - 2. CLUSTER - 3. (NAME(B5926VSM.G5925.VXVRELGA.KSDS) - 4. STORAGECLASS(SCVSAMDB) MANAGEMENTCLASS(MCDBASE) - 5. INDEXED SPEED SHAREOPTIONS(2 3) - 6. RECORDSIZE(146 146) - 7. KEYS(21 0) - 8. FREESPACE(0 0)) - 9. DATA - 10. (NAME(B5926VSM.G5925.VXVRELGA.KSDS.DATA) – 11. VOLUMES(* * * * * * *) - 12. CISZ(4096) - 13. CYLINDERS(275 20)) - 14. INDEX - 15. (NAME(B5926VSM.G5925.VXVRELGA.KSDS.INDEX) – 16. VOLUMES(* *) - 17. CISZ(2048) - 18. TRACKS(1 1)) 1. DEFINE - start the define 2. CLUSTER - start the cluster record define 3. (NAME(...) - the cluster name. This is the DSN to access the file 4. Storage/Management Class - tells SMS where to put the file 5. File options is discussed later 6. RECORDSIZE - the average followed by the maximum allowed record size. If the two values are the same, the record format is fixed 7. KEYS - describes the key length and offset. In this case, the key is 21 bytes long and starts in byte 1 of the record (offset 0) 8. FREESPACE - given as % of CI free bytes, then % of CIs free in a CA 9. DATA - start the data component record definition 10. (NAME(...) - the data component name 11. VOLUMES - DASD names (one or more, up to 255) for the file's location. Asterisks are placeholders in an SMS environment 12. CISZ(4096) - the CI size of the data component 13. CYLINDERS(275 20) - the amount of disk space for the data component 14. INDEX - start the index component record definition 15. (NAME(...) - the name of the index component 17. CISZ(2048) - the CI size of the index component 18. TRACKS(1 1) - the amount of disk space for the index component There are a large number of options you can assign VSAM files. We'll be discussing VSAM file definitions very shortly.
  • 19. 19 VSAM Alternate Index Structure A KSDS can be accessed in three "normal" modes: sequentially by primary key, randomly by primary key, and directly by RBA (PL/I and Assembler only). The KSDS components (data and index) can also be read as ESDS files by opening each separately. Often, though, the application may need to access the records using something other than the primary key. The obvious example is, say, a file keyed on employee number but also containing the last name or social security number of each person. VSAM allows this through use of the Alternate Index (AIX). A file can have up to 255 AIXes, but for practical purposes four is the maximum number. The AIX is not just another index component built over a base KSDS. It is not possible to have two primary keys (two indexes) over the same data, because the data needs to be in key order. An AIX is a complete KSDS in itself, with data and index components and a connecting cluster component. That raises the objection, "Why duplicate the data sorted a different way? Isn't that wasteful?" The AIX is not like that at all. The data component of the AIX contains (usually) very small subset records with a control information field, the alternate key, and one or more base keys. The control information has several binary values describing the contents of the record and is used to locate the other fields during record processing. The alternate key field is used to control sequencing of AIX records; in other words, the AIX's primary key. The base key field is a pointer value used to locate the associated record(s) in the base cluster. For an ESDS this field is an RBA, because records in an ESDS cannot move around within the file. For a KSDS, because records move around frequently, the base key field is the actual key of the record(s) to be retrieved. When a user program reads an AIX for a KSDS, VSAM gets the alternate key record from the AIX and uses the base key field to access the index of the base cluster. A base record is then located and passed to the user program. Connecting AIXes to Base Clusters In many respects, VSAM considers an AIX to be a KSDS. An AIX cannot be defined until its base cluster is defined. AMS provides a special command, the BLDINDEX, to load a defined AIX. However, even when an AIX is defined, it cannot access the base cluster unless an association is defined between the base and AIX. This is a sort of "super cluster" record and is called a PATH. When a program needs to read a base cluster through its AIX, it must allocate the AIX using its PATH name. A reference to the PATH causes VSAM to interpret the read request on the alternate key as a request to read the associated base cluster record.
  • 20. 20 Alternate Index Attributes VSAM design forces the key of a KSDS to be unique within its file. For example, in a file keyed by Employee Number no record can have the same Employee Number. For the same reason, all alternate indexes are built with a key unique within the alternate index file. Each alternate index, however, can be given an attribute of UNIQUEKEY or NONUNIQUEKEY. That attribute defines whether the alternate key is unique in the base cluster records. It must be decided when an AIX is created whether the alternate keys is unique. NONUNIQUEKEY is the default, so you must code UNIQUEKEY if the base records' alternate keys are to be unique. If the alternate key is not unique, you'll need to be able to locate all the records having that key. The AIX record contains multiple base key fields if the AIX is defined as NONUNIQUEKEY, and the record length in the AIX is variable. If the AIX is defined with UNIQUEKEY, the AIX record length is fixed. Keeping the Index in Sync with the Base Cluster When a new record is inserted into a file with an alternate index, the new record is not accessible through the AIX until the AIX is updated with the record's alternate key information. Conversely, should a base record be deleted, the AIX may still have a record pointing to the deleted base record. There are two ways to keep an AIX in sync with its base cluster. The obvious method is to rebuild the AIX after each change to the base cluster. That may be impractical. Fortunately VSAM can do the synchronization for you. A default attribute for an AIX is UPGRADE. That tells VSAM to update the AIX every time the base cluster is updated. Here's what UPGRADE provides: • Each record inserted into the base cluster causes an insertion (new record or base key field) in the associated AIX(es). A base key field insert results in the AIX record getting longer. If there is a duplicate alternate key and the AIX is defined with UNIQUEKEY, the entire insert fails. • Each record deletion in the base cluster causes a deletion (either a record delete or a base key field delete) in the AIX. If the AIX was defined with NONUNIQUEKEY, the AIX record gets shorter. • Each update to an alternate key field in the base cluster results in a record or field deletion and a record or field insertion in the AIX. See Appendix B for examples of AIX access using COBOL and Assembler.
  • 21. 21 DASD Volume Space Management for VSAM When VSAM was first planned, the VSAM catalog was to replace the VTOC in the role of DASD management. That is incompatible with having VTOCs manage the DASD. Volumes with VSAM files could not have non-VSAM files on them, while volumes using a VTOC and non-VSAM files could not have VSAM files on them. Since eliminating non-VSAM proved impossible, IBM worked out a method to allow both VSAM and non-VSAM files to be on the same volumes. DASD volume management has always relied on the track allocation information kept in the VTOC to be accurate. For VSAM files that information was kept not only in the VTOC but in the VSAM catalog as well. The two need to be kept in sync, but how? Obviously the solution is to make either the VTOC or the VSAM catalog the owner of the information, and the other subordinate to it. All the old software remained; it would have been ugly to add non-VSAM space management into the VSAM access method code. IBM put the VTOC in charge of DASD space management. There are problems with that. VTOC DSCB records do not keep historical information. Remember, VSAM files are actually multiple datasets accessed using a single DSN; the VTOC doesn't allow that. Nor does the VTOC allow logical connection of an AIX with its base cluster. Finally, there is a need to allow multiple volumes to be easily associated with a particular DSN because VSAM has multiple components associated with a single DSN. The VTOC can control files only on its own volume. The VSAM catalog was created to address those problems. The result was, in effect, a multi-volume VTOC. The catalog, a VSAM file itself, can contain all the information the OS needs to access any file in that catalog. Unfortunately there were still problems. Those of you who worked with VSAM in the 1970s on into the early 1980s remember having to define VSAM user catalogs and separate, suballocated or unique data spaces before defining any VSAM files. To eliminate these problems, the VSAM catalog was replaced with the Integrated Catalog Facility (ICF). Under the ICF catalog, all that extra work went away.
  • 22. 22 Finding a VSAM File: The ICF Catalog VSAM follows a definite order of search to locate a catalog entry for a file when access is requested. When creating a new file, the first catalog found in the search is used. When locating an existing file, many catalogs may be searched until an entry is found. The search order is: • The catalog whose ALIAS matches the high level qualifier of the file being accessed. Beginning with DFP 3.1 up to four levels of the DSN may be used; • The system's master VSAM catalog. This may fail with a RACF error. DFP and the ICF Catalog DFP solves the VSAM catalog problems by using the ICF catalog. This new catalog is a two-part file. A Base Catalog Structure (BCS) contains pointer records to locate a file's descriptor records. Those are kept in a VSAM Volume Data Set (VVDS) file. The VVDS contains the file's attributes and volume allocation information. All DASD space management functions are removed from VSAM, allowing elimination of data space and volume ownership concepts. An ICF catalog consists of only one BCS and one or more (one per DASD volume) VVDS files. The BCS is a VSAM KSDS and contains history, protection, and association information on a given file. It also holds volume location information such as the VOLSER(s) where the file resides. A VSAM file can be in only one BCS, but a single BCS can access any number of VVDSes. The VVDS Each DASD volume containing VSAM objects has a VVDS. Physically, this is an ESDS and is always called SYS1.VVDS.V<volser>. The VVDS cannot be changed by most of the AMS functions. Each data record of the VVDS contains one file's attributes, statistics, allocation, and volume information. This record is called a VSAM Volume Record (VVR). All VSAM files on a given volume have an associated VVR. Although the VVR can point back to only one BCS, any number of BCSes can point to a given VVDS.
  • 23. 23 Finding Any File using the ICF Catalog When locating a file, the high-level qualifier (HLQ) is used as a search argument. The master VSAM catalog is read to see if a BCS has an entry for a matching alias. If an alias match is found, the BCS is located and searched for a record whose key is the complete file name requested. File permissions are checked next. If the user can access the file, VSAM uses the association information to determine which VVDS (VSAM) or VTOC (non-VSAM) to go to next. The file descriptor record is then read to locate the actual file and make its records available. Of course, the VVDS/BCS connections aren't perfect. Say a file on a volume is pointed to by the VVDS but has no BCS entry (an uncataloged file). If the user needs the file, an AMS DEFINE subcommand RECATALOG can be used. If the user doesn't want the file, the VVR record in the VVDS can be deleted. Or, say a file is cataloged, but the VVDS pointed to by the BCS does not have a descriptor record for the file. If the user wants the file, the only recourse is to delete the BCS entry (DELETE NOSCRATCH) and restore the file. If not, the NOSCRATCH delete cleans up the catalog. Defining a VSAM file Access Method Services (AMS) was designed as a replacement to JCL file allocation. AMS commands have a similar structure to the CLIST language, including the ability to set condition codes, testing values, GOTOs, etc. AMS is executed using the program IDCAMS. The DEFINE command is only one of the commands AMS has, but it is the only one you can use to create VSAM (and certain non-VSAM) files and other catalog objects. The following items can be defined: • CLUSTER - a VSAM KSDS, ESDS, RRDS, or LDS; • AIX - a VSAM alternate index; • PATH - an association record between an AIX and its base cluster; • ALIAS - a VSAM HLQ or another name for a non-VSAM file; • NONVSAM - Catalog a non-VSAM file in an ICF catalog. • GENERATION DATAGROUP - also called a GDG. For a complete list of parameters for the DEFINE CLUSTER command please see Appendix C. This document covers those you may need to use.
  • 24. 24 Define Cluster Basic Parameters NAME( ... ) is required at the cluster level, optional on data and index levels. It provides a DSN by which the file is accessed. If the data and / or index name parameter is left off, VSAM assigns a name using the cluster name with a DATA or INDEX appended to it. OWNER( ... ) is optional at all levels. If the define is run interactively in TSO, the TSOID is used as the owner. In all other cases the owner is NULL. INDEXED / NONINDEXED / LINEAR / NUMBERED specifies the file is (respectively) a KSDS, ESDS, LDS, or RRDS. This can only be specified at the cluster level. Abbreviations for these are (respectively) IXD, NIDX, LIN, NUMD. KEYS(n n) specifies the length and location of the primary key for indexed clusters only. It is valid at the data or cluster level only. The first number is the length of the key; the second number is the offset relative to the first byte (i. e. 0 indicates the key starts in the first byte). For example, if the key is in bytes 27 - 44, the parameter is KEYS(18 26). The default is KEYS(64 0). You cannot specify non-contiguous keys. Define Cluster Space Allocation Parameters RECORDSIZE(n n) - abbreviated RECSZ, this specifies the "average" and "maximum" record length to be loaded to the data component. If the two values are the same, VSAM expects all records being loaded is that length, but accepts records of any size up to the maximum specified. If the two values differ, VSAM assumes the records may be any size up to the maximum length. In this case, the "average" size can be anything and does not affect the cluster definition unless the RECORDS parameter is used. The default RECORDSIZE is (4089 4089) if the CI size is 4096 or greater. If the CI size is smaller the default is the CI size minus seven bytes. If the file is SPANNED the default is (4086 32600). Note: REPRO and EXPORT cannot handle record sizes larger than 32760. Be aware of that when designing your file. If RECORDSIZE specifies variable length records, you should look into how often the record varies from record to record and fit the CI size to that. VSAM is a fixed block access method and variable records do not necessarily fit well into fixed block sizes. If you like statistics, two standard deviations of all record sizes in the file gives the 95th percentile of the average record lengths.
  • 25. 25 Define Cluster Space Allocation Parameters (cont'd) CONTROLINTERVALSIZE( ... ) specifies the control interval size. This is optional and can be coded at the cluster, data, or index level. Its abbreviation is CISZ or CNVSZ. This parameter cannot be specified for an LDS. For a list of allowable CI sizes, see Appendix A. Keep the following in mind when choosing a CI size: • If specified at the cluster level, all components have the same CI size unless it is overridden; • For a KSDS, if the CI size is specified at the data level but not the cluster or index level, AMS calculates one for the index. It is recommended to code the CI size at the data and index level. When selecting a CI size, try to get the best fit for your records. First, determine if you are using SPANNED records. Generally spanning is a bad idea unless the record size is very large or if variable records vary by large amounts. If your record size is fixed, simply make a list of CI sizes giving a reasonably good percentage used (say, 95%). Keep in mind, a fit of 100% is not good because VSAM needs room to write its CIDF and RDFs. For example, a record size of 200 fits quite well in CI sizes of 1024, 2048, 3072, ... but not so well into 512, 1536, 2560, .... Note they also don't fit as well as expected in a CI size of 4608 since 10 bytes are needed for the CIDF and the two RDFs. For a KSDS or RRDS you should pick a (relatively) small CI size. This allows both faster data transfer rates and a lower lockout ratio, since getting one record in a giant CI may cause closely spaced records to become temporarily inaccessible (and take longer to access). To speed sequential processing, use buffers.
  • 26. 26 Define Cluster Space Allocation Parameters: CI size (cont'd) In a KSDS you should make sure the CI has enough room for free space to allow for growth. This can be tricky if you aren't careful. Here are two examples with opposite effects: Assumptions: RECSZ(1000 1000), CISZ(2048). With no free space, two records fit in a CI (2,000 bytes in a CI of 2048 bytes). With 3% free space: 2048 * .03 = 61 bytes to be left free; 2048 - 61 = 1987 bytes "loadable"; 1987 / 1000 = 1 record loaded per CI. In this case the requested 3% free space turned out to be 50% because only one record is loaded per CI rather than two. Assumptions: RECSZ(600 600), CISZ(4096). With no free space, six records fit in a CI (3,600 bytes in a CI of 4096 bytes). With 10% free space: 4096 * .10 = 409 bytes to be left free; 4096 - 409 = 3687 bytes "loadable"; 3687 / 600 = 6 records loaded per CI. In this case, the requested 10% free space turned out to be no free space as all six records are loaded in each CI. Finally, you should try to select a CI size fitting the DASD type. The smallest CI sizes use only about 50% to 75% of the available track space. A CI size of 4096 (or any 2K increment above that) uses about 87% or more of the track space and would be a better pick. A Word about the Index CI size The index CI size is just as important as the data CI size. The index CI size must be large enough to index all data component CIs within each CA. When the index CI size is not large enough, VSAM leaves data CIs empty within each CA. Two problems indicating the index CI size is too small: • The file takes more space than you'd expect based on its space calculations; • CA splits occur sooner than expected based on the free space percentage. Steps to calculate the index CI size can be found beginning on Page 39.
  • 27. 27 Define Cluster Space Allocation Parameters: Free Space FREESPACE(n n) specifies the free space percentages for each CI and CA respectively. The abbreviation for this parameter is FSPC. The percentages can be any number from 0 to 100. Regardless of the percentages, VSAM always tries to load at least one record per CI and one CI per CA. How to Calculate Free Space: the CI The amount of free space is dependent on the file’s activity and over what amount of time. Here are the things to consider: • The data format of the keys (single or multiple fields, data type (alphabetic, numeric, or alphanumeric)) and how tightly spaced the keys are; • Where record inserts occur within existing keys. If inserts occur randomly free space distributed evenly is helpful. If inserts occur only at the end of existing records or in clustered groups, free space won't be reserved properly; • In general, you should pay for inserts up front. It's not feasible to plan for free space for the life of the file, so know how many inserts is made in the reorganization and update lifecycle; • How many deletes are made during the same lifecycle. Remember, in a KSDS a delete frees up space to be possibly used for a later insert. If the above conditions are understood, it's fairly easy to calculate free space. You would simply divide the net insertions by the record count, resulting in the overall percentage increase to the file over time. Let's look at an example. Assumptions: 100,000 initial records, a CI size of 4096, and a fixed record size of 200 bytes. Based on the frequency of reorganization, it's decided the file needs room for 25,000 records to be inserted. During the same period an estimate of 5,000 deletes is performed. Further, the keys to be added or deleted occur randomly among the key ranges, and the timing of delete actions allows us to use a "net insertion" value of 20,000. Thus: Net insertions * 100 = CI freespace percent 20,000 * 100 = 17% Total records + net insertions 100,000 + 20,000 From this calculation we see the total amount of CI free space should be at least 17%. For general purposes you could round the free space percentage to 20% because it's easily workable.
  • 28. 28 How to Calculate Free Space: the CI (cont'd) In a previous example, we showed the need to be careful in applying free space percentage. VSAM may give you substantially more or less than the coded percentage free. This calculation is the one to use: Records loaded = CI size - (CI size * CI freespace%) Record size Plugging in the values from our existing example: Records loaded = 4096 - (4096 * .20) or 4096 - 819 = 16 200 200 You may have noticed the CIDF and RDF lengths were ignored in the calculations. For a fixed-length record this value is 10 bytes. For a variable length record, the total length of the CI information fields is the number of expected records in the CI times 3, plus four bytes. You also should have noticed the calculated free space percentage, 20%, is what we actually achieved. That was not an accident. That's why the ratio of record size to CI size is so important. How to Calculate Free Space: the CA The second number in the FREESPACE parameter is the percentage of CIs to be left free in each CA. For example, a 4K CI size has 180 CIs per CA, so with a parameter of FSPC(0 10), 18 CIs is left free in each CA. If the file is defined with any CI free space there should be at least some CA free space as well. Because a CA split is so resource intensive and time consuming, it's well worth it to make your file slightly larger to avoid CA splits. But how much is enough? Assuming you have the optimum CI free space calculated, CA free space would be used only for those situations you couldn’t plan for. Because that's so hard to predict, a small amount of CA free space should be sufficient. One guide could be to use 1/2 the amount of CI free space as your CA free space percentage. 10 percent should be plenty. Note, however, CA free space percentage needs to be accounted for in any space calculations. These calculations are covered shortly.
  • 29. 29 Other Free Space Concerns So far we've discussed only evenly distributed update activity. Of course, inserts can be clustered. What can you do about it? If the inserts are somewhat clustered, try using no CI free space but a moderate CA free space percentage. That puts all free space at the end of each CA, allowing CI splits to happen if necessary. However, that could backfire. If most of your activity happens in a few CAs, all the free space in the other CAs is wasted. For heavy clustering, consider no free space of any kind and let VSAM split as needed. During an initial break-in phase performance is poor, then improve to a point after the splits have provided enough free space to accommodate more inserts. If this strategy is taken, you should cut back on reorg frequency. Many people think they need to reorganize when the number of splits has gotten high, and that's just wrong. It is true taking a CI or CA split causes a performance loss during the split, but the performance after a split is no worse than as if the split had never occurred. In some cases, the file may actually perform better after several splits due to an increase in distributed free space. Consider this: free space can be altered at any time using the AMS command ALTER. Assume we have these ranges of keys in an employee database: Keys 00001 - 19999 with turnover at 5% Keys 20000 - 49999 with turnover at 10% Keys 50000 - 99999 with turnover at 30% You can load the file to accommodate the different free spaces needed for these keys: DEFINE CLUSTER ( ... FREESPACE(5 5) ...) REPRO ... FROMKEY(00001) TOKEY(19999) ALTER ... FREESPACE(10 5) REPRO ... FROMKEY(20000) TOKEY(49999) ALTER ... FREESPACE(30 15) REPRO ... FROMKEY(50000) TOKEY(99999) Of course, this method has its problems. The biggest one is the fact the file used as input must also be keyed (e. g. a KSDS). However, it does work.
  • 30. 30 Define Cluster Space Allocation Parameters: Space Space can be allocated as any of the following: CYLINDERS (CYL), TRACKS (TRK), RECORDS (REC), KILOBYTES (KB), or MEGABYTES (MB). If the DATACLASS or MODEL parameters are used, this argument is not needed. It is not relevant when RECATALOG is specified. The argument is supplied as a primary and secondary amount. It can be specified at the cluster, data, or index levels. Let's discuss the types of allocations you can specify. When allocating in records, you specify the number of records you think the file can hold. AMS then calculates the file size for you. Allocating in records is not recommended because of these three drawbacks: • AMS uses the average record length to calculate the space. This is fine for fixed length records but may be way off for variable records; • None of the factors affecting space are considered when AMS calculates the space (e. g. FREESPACE is ignored). To account for them, the user must "make up" a record number to accommodate the extra space needed; • AMS determines what the CA size is. CA sizes other than one cylinder are very inefficient, but nobody told AMS that. Allocating in tracks should only be done in special circumstances, e. g. there are very few records in the file. Here are a few examples: • TRACKS(10 2) is very bad. The CA size is set to 2 tracks, giving a 2-level index and requiring three I/O for each random access. • TRACKS(2 2) is good as long as the data doesn't go into extents. The index consists of a single sequence set record and thus only one level. If the file takes an extent, however, each random access requires two I/O. This extra I/ O cannot be saved by extra buffering, either. • TRACKS(10) is good as long as the file never needs an extent (if it does an abend occurs). Again, this allocation needs only a single CA and thus only one I/O to get any record. Unless the installation is extremely tight on DASD, you should just go ahead and allocate a small file as CYL(1 1). When allocating in megabytes or kilobytes, AMS rounds up to the minimum number of cylinders to contain that much space.
  • 31. 31 Define Cluster Space Allocation Parameters (Cont'd) If the space argument is specified at the cluster level for a file with only one component (e. g. an ESDS), that space is allocated to the data component. Space allocations cannot be specified at both the cluster and data levels. In a multiple component file (e. g. a KSDS), the location of the space argument can be important. • When specified at the cluster level the amount is divided between the data and index components. The division is based on the other file attributes. After the division, if the amount determined for the data is not an even multiple of the CA size it is rounded up to the next higher CA multiple. This can result in a larger file than was specified. If a value is specified at the cluster level and another level, an error results. • When specified at the data level the entire amount is given to the data component. AMS determines a proper amount to the index component based on the file's attributes. • When specified at both the data and index levels AMS gives the coded amount to each component. For a KSDS, you should specify the space at the data level or both data and index levels. Multiple Volume Allocation If the file extends to a new volume, VSAM takes the primary number of cylinders, not the secondary number. For that reason it's a good idea on a large file to code space for a 2,000 cylinder file as, for example, CYL(250 200) rather than CYL(2000 50). A VSAM file can take up to between 119 and 123 extents, with up to five being used for the primary allocation. Define Cluster Space Allocation Parameters: Volumes The VOLUMES parameter tells VSAM what device(s) to allocate to the file. If more than one volume is specified, all volumes can be used. An "*" is used in an SMS environment and tells VSAM to use whatever volume SMS tells it to use. A maximum of 59 volumes can be specified. The abbreviation for VOLUMES is VOL.
  • 32. 32 Define Cluster Space Allocation Parameters (cont'd) SPANNED / NONSPANNED specifies whether records can be allowed to span across multiple control intervals. Abbreviated by SPND and NSPND, the default is NONSPANNED. The exception is the AIX, which is always SPANNED. With SPANNED, records larger than the CI size (if the maximum record size is larger than the CI size) are stored in pieces across continuous CIs until the entire record is stored. A CI can contain either a segment for a spanned record or no spanning records, but not both. If a spanned record does not use a full CI for its final segment, no other records can use that space. Free space arguments are ignored for spanned records. Specify SPANNED only in these situations: • The file has variable length records and only a very few are large; • The file has variable length records and there is a large variation in record length, with the most common length in between the largest and smallest.
  • 33. 33 Calculating Space for a VSAM file A fairly precise set of calculations is available to get an accurate allocation amount for your VSAM file. You need to gather the following information: • The total number of records the file initially contains; • Record sizes for the file. A fixed record length is easy to calculate for. Variable record sizes are far harder to work with unless you have very accurate information on how the records vary over time and between records; • For an ESDS, determine how many records are added after initial load; • For a KSDS, establish the reorganization cycle time; • For a KSDS, determine how many inserts and deletions will occur as well as when and approximately where in the file they occur. Where insertions occur determines where free space should be reserved and how much the file needs. This topic was discussed in the free space description above. Appendix D has a pair of work sheets you can use to determine the amount of space your VSAM file needs. Calculating an Index CI size We talked about calculating a data CI size in the DEFINE CLUSTER section. Because the index CI size is an optional parameter, we've pretty much ignored it until now. However, it is an important consideration to VSAM file design. Beginning in zOS v1.4, AMS calculates the minimum index size this way: ASSUMPTION: key length 11 bytes, 4096 data CI size. • Divide the key length by three, always rounding up. For example, dividing 11 by 3 results in 2.75, and rounding up to 3; • Add 3 bytes (the index record’s overhead) to the result. In our example, this makes 6; • Multiply the result by the number of CIs in a CA. In our example, 6 * 180 = 1080; • Take the smallest CI size large enough to contain a record that size. The smallest CI size large enough to hold a 1080-byte record is 1536. The index CI size picked by AMS may be a reasonable estimate; however, it is highly recommended the designer calculate and explicitly assign an index CI size.
  • 34. 34 Calculating an Index CI size (cont'd) Index set records should be as small as possible to decrease search time. At the same time, the sequence set records must be large enough to address all the CIs in one CA. During the initial load, if the sequence set record runs out of space before the CI is full, VSAM moves on to the next CI, leaving the previous CI partially empty. That can increase the space needed by the file tremendously. To determine the necessary index CI size, the file's key length and the data CI size must be known. Appendix A lists the number of CIs per CA, and the table just below shows the average compressed key length. These two numbers are multiplied together to find the required index CI size. To summarize: • Make an effort to determine a compressed key length and add 3; • Find the number of CIs per CA; • Multiply the two and round up to the next allowable CI size value. If your keys don't compress well, the calculated CI size will be too small. As a general rule of thumb, large keys with multiple fields where a middle field is EBCDIC and packed with blanks will not compress well. If you think you may have a poorly compressed key, it's usually safe to add 512 to the calculated index CI size. As a general rule, a 4K (4,096 bytes) data CI size and a 2K (2,048 bytes) index CI size are "best". Here is a real-world example. In a given file the key length was 47 bytes, the average compressed key length was 24, and the data CI size was 3072. With an index CI size of 2560, the file needed 4,050 cylinders to hold all its data. With an index CI size of 3072 (the next CI size up), the same data was contained in 2,800 cylinders. That's a storage gain of nearly 30%, but we can get still more. If the data CI size is taken as correct this file really needs an index CI size of 6144: 24-byte compressed key + 3 bytes = 27 bytes; 27 bytes * 225 CIs per CA (3072 data CI size) = 6075; next valid CI size up from 6075 is 6144. Using the index CI size calculated, the file shrunk down to 1,600 cylinders. Perhaps it's obvious, but the larger the data CI size is, the smaller its index CI size is.
  • 35. 35 Define Cluster: Performance Tuning Parameters The BUFFERSPACE parameter specifies the number of bytes used for buffered processing of the file's data and index records. It can be abbreviated as BUFSP or BUFSPC. There's no real need to code this parameter. The default buffer space is calculated at define time by AMS as enough space to read two data CIs and (if a KSDS) one index CI. If the parameter is coded in the define, it overrides the default only if the specified value is larger. Here is an example: Assume a 4K (4,096 byte) data CI size and a 2K (2,048 byte) index CI size. The default space is 2*4K or 8K, plus 1*2k, for a total of 10K. The result is BUFSP(10240). Buffer space is divided at open time as follows: • If the file is opened sequentially, one buffer is reserved for index processing and the remaining space is allocated to data buffers; • If the file is opened for random processing, two buffers are allocated to process data records and the remaining bytes are dedicated to index processing. Even IBM documentation considers this a performance problem. While the BUFFERSPACE parameter can be used to determine buffer amounts it is frequently better to explicitly state buffer needs in one of the following methods: • AMP JCL parameter in the DD statement; • Assigning LSR buffers; • ACB parameters in an Assembler program; • DECLARE parameters in a PL/I program. We'll discuss AMP and LSR buffers in turn.
  • 36. 36 Dynamic Buffer Specification The AMP JCL parameter allows the user to specify the number of buffers to use for a VSAM file. The AMP subparameters are: • BUFSP - the number of bytes for all components of the file. Providing this argument is exactly like supplying the BUFSP argument in the define; • BUFND - specifies the number of buffers to be used by the data component; • BUFNI - specifies the number of buffers to be used by the index component. In almost all cases paired BUFND/BUFNI parameters are preferred because they're more easily understood at a glance. It is not required to specify them in together; use one or the other or both. This is the syntax for the AMP parameter: AMP=('BUFND=xx,BUFNI=xx') The parentheses are not required. Data buffers (BUFND) are used for sequential processing and index buffers (BUFNI) are used for random processing. Most people know the default buffering is not enough. The question is, "How much buffering do I need?" The answer is not simple, partly because different processing modes can be affected by buffering amounts. There are many variables affecting VSAM's use of buffers, most of which are too complicated for this document. The picture is much simpler if we consider only batch applications using COBOL.
  • 37. 37 Buffers for Random Processing Random processing means the program issues a keyed I/O request looking for a single record. A keyed search always starts at the top of the index structure, following it all the way through to the sequence set before reading the actual record. VSAM always looks in its index buffers to see if the required index set record is already in memory. With the IBM default of only one buffer, an I/O is needed whenever a new index record is needed. VSAM does not allow the entire index to be read into memory. While any number of index set records in memory is searched, VSAM recognizes only one sequence set record per I/O request no matter how many are sitting in buffers. Because most index records are sequence set records, for most applications VSAM can keep the entire index set in memory using very few buffers. It's not always necessary to do so. If the transactions to be run against the file are presorted, the number of index buffers should be equal to the number of the highest level of the index. For example, if your index has four levels, the proper parameter is AMP=('BUFNI=4'). If the transactions are unsorted, you must know how many index set records there are. This information is not in the LISTC, but can be calculated. Locate the total number of index records in the LISTC and subtract the number of sequence set records. The remainder is the number of index set records. Since there is one sequence set record per CA, the number of CAs in the file is the number of sequence set records. Note: the number of cylinders does not give a completely accurate account of the number of CAs, because some of them are not being used and have no sequence set record. The formula is: BUFNI = total index records - Data HURBA + 1 CA size The "+1" accounts for the required sequence set record. In most random access situations, the default number of data buffers is sufficient. By definition, "random" means "anywhere in the file," so the likelihood of finding the next record in the same CI is minimal. The only time more data buffers might be useful during random processing is in a case where the user expects a large number of CA splits during the update. In this case, processing would be speeded considerably during a split. On the other hand, you're not supposed to get CA splits.
  • 38. 38 Buffers for Sequential Processing Sequential processing means the program asks for records using sequentially higher key values (KSDS) or in physical sequence (ESDS), one record after another, such as READ ... NEXT. Buffering for this predictable request is easy. There are two types of buffer processing VSAM can do. One is called "alternating buffer support," in which one buffer is acted on asynchronously by VSAM while another is used by the application program. The other is called "read ahead" (or "prefetch"). In this process VSAM reads ahead, asynchronously filling buffers in advance while the program is processing older buffers. When enough buffers exist, VSAM builds channel programs to read or write CIs based on half- or full track multiples, saving rotational delay. With alternating buffering, VSAM needs twice as many buffers, so we should double our request as well. VSAM can chain schedule up to a full CA in a single I/O request if there are enough buffers available. In the extreme case, when the application program needs to read the entire file sequentially, it is worthwhile to code enough data buffers to read two CAs at a time. For most application programs, the need to read the data sequentially does not span the entire file but only portions at a time. In that case, buffering one track at a time is the optimum balance between speed and using too much storage for data buffers. The calculation is: BUFND = 2 * (Number of data CIs per track) + 3. The "+3" accounts for the "work CI" VSAM reserves and spares two for those circumstances when CIs span tracks. In sequential processing there is never a need for more than one index buffer. Direct Access Buffering with LSR Assembler programs (and, of course, CICS regions) have access to another buffer pool called the Local Shared Resource pool, or LSR. LSR can also be used in batch (BLSR). A complete discussion of LSR and BLSR is too detailed for this document. Fortunately there is an IBM publication on BLSR; it’s listed in the bibliography.
  • 39. 39 More Performance Options SPEED / RECOVERY specifies whether the data component is pre-initialized at load time. RECOVERY can be abbreviated as RCVY; there is no abbreviation for SPEED. Using RECOVERY causes VSAM to do triple I/O during the load because of the pre-initialization of CIs and sequence set records. For an ESDS, RECOVERY lets you restart the load "in the middle." In other words, even if the load fails VSAM considers what was loaded to be good. Sometimes, however, VSAM cannot restart a load of a KSDS. RECOVERY is the default; always code SPEED for a KSDS. WRITECHECK / NOWRITECHECK specifies whether VSAM double-checks each WRITE. Abbreviated as WCK or NWCK, the default is NOWRITECHECK. Because of the considerable reliability of I/O channels and the excess overhead involved in rechecking each WRITE, use the default. ERASE / NOERASE specifies whether the data component is to be rewritten with binary zeroes at delete time. Abbreviated as ERAS/NERAS, the default is NOERASE. RACF has built-in erase functions, so if the file is protected by RACF the ERASE parameter is not needed. REUSE / NOREUSE specifies whether the file can be opened over and over with a new set of records (REUSE) or must be deleted and redefined before it can be reloaded (NOREUSE). Abbreviated as RUS/NRUS, the default is NOREUSE. In JCL temporary files are processed by non-VSAM access methods. With the REUSE parameter, a permanent VSAM file is treated as a temporary one (at least regarding content). One advantage a REUSEable file has over a JCL temporary file is the ability of the user to decide whether to keep the records. Temporary JCL files go away at the end of the job, along with their contents. In a VSAM file defined with REUSE, however, records are deleted only if the user program opens the file as OUTPUT. Normally records remain in the file unless specifically deleted by another user process. Technically, REUSE sets the HURBA to 0 if the file is to be overwritten. This causes all secondary volumes (and their extents, of course) to be marked as candidates. It does not, however, free up existing extents on the primary volume. All statistics in the catalog are set to NULL. A REUSEable file cannot have alternate indexes. In most cases the SAM temporary files are sufficient. However, a VSAM REUSEable file is the only way to go if a random-access temporary file is needed.
  • 40. 40 Define Cluster: More Performance Options SHAREOPTIONS(n n) specifies how the cluster or its components are shared among its users within the system ("cross-region") or across multiple systems ("cross-system"). The abbreviation is SHR, and the default is SHR(1 3). The cross-region parameter can have these values: 1. Any number of users can have read access (OPEN INPUT) or one user can have update access (OPEN I/O). VSAM ensures complete integrity. 2. Any number of users can have concurrent read access, and one user can have update access. If the user requires read integrity you must use the ENQ and DEQ macros to obtain exclusive control of the file. The file's records are safe due to the restriction of one update access at a time, but a record can be updated after a program has read it. 3. Any number of users can have read access, and any number of users can have update access. VSAM provides no integrity checking, so the program must provide all of it (including the use of ENQ and DEQ). 4. The same as 3. but also flushes the buffers for each I/O request. The cross-system parameter can have these values: 3. Any number of users can have read access and any number of users can have update access. VSAM provides no integrity checking; the program must provide all of it. In this case, RESERVE and RELEASE macros are more useful than DEQ/ENQ because they work on the DASD rather than the file. 4. The same as 3. but also flushes the buffers for each I/O request. In addition, under option 3 VSAM takes these three special actions: • A change in the file's HiUsedRBA is not allowed (the file cannot grow larger); • For a KSDS, a change in the high-key RBA is prevented. In other words, the highest key in the file does not move; • Data and sequence set buffers are refreshed for keyed searches. Note this applies only to direct (random) requests; sequential requests can still use copies of the CI or sequence set records from the buffer. You'll note the cross-system options do not provide much security. Before IBM's SYSPLEX architecture, most installations used either IBM's Global Resource Serialization (GRS) product, or Multi-Image Management (MIM) from Legent, to handle file sharing. SYSPLEX handles all that now. Unless you need to update a file in batch while it's active in real-time, and read integrity is not necessary, you should code SHR(2 3).
  • 41. 41 Define Cluster: Installation Defaults MODEL( ... ) specifies an existing cluster, data, or index component is to be used for assigning attributes to a new file. If desired, all attributes of a file can be assigned through modeling. The only required parameter is the cluster name (and, of course, MODEL). If the MODEL parameter is specified, any attributes of a file not explicitly given in the define are taken from the catalog entry of the file in the MODEL statement. If a parameter is specified, its value in the define takes precedence over the same attribute in the file being modeled. Take note of the parameters supplied in the define to make sure they don't conflict with the file being modeled. For example, if a RECORDSIZE parameter is coded in a define but the CISZ is not, make sure the CISZ in the existing file can handle that record size. It is not possible to define a dummy VSAM file with no size to be modeled. It is possible to set up a dummy file of minimal size to be modeled. Define Cluster: Security Authorization Parameters MASTERPW / CONTROLPW / UPDATEPW / READPW ( ... ) specifies passwords for (respectively) full, component, update, and read access. Today with RACF these are no longer needed (as are the next three parameters) and therefore no descriptions are given. ATTEMPTS: the number of tries a user has to enter the correct file password. CODE ( ... ) specifies a message to be sent to the operator when an access failed all its ATTEMPTS. AUTHORIZATION ( ... ) specifies the program name of the User Security Violation Routine (USVR) to be given control during OPEN processing. This is given control only if a valid password is given. It is ignored if a MASTERPW is given successfully. TO ( ... ) / FOR ( ... ) specifies a time during which the file cannot be deleted except using the PURGE parameter. The TO argument is a "Julian" date of YYYYDDD; the FOR argument is a number of days.
  • 42. 42 Define Cluster using SMS The following three parameters are effective only under SMS. If SMS is not active the define fails. DATACLASS ( ... ) specifies a 1 - 8 character name of a Data Class list. This is used to assign allocation attributes to the cluster such as number and type of size parameters. Generally, DATACLAS is not used. MANAGEMENTCLASS ( ... ) specifies a 1 - 8 character name of a Management Class to assign retention, backup, and migration attributes to the file. Abbreviated as MGMTCLAS. STORAGECLASS ( ... ) specifies a 1 - 8 character name of a Storage Class to assign devices and volumes for file allocation. Abbreviated as STORCLAS. One final Define Cluster Parameter RECATALOG / NORECATALOG parameters cause VSAM to recreate the Cluster Sphere record for the file without actually redefining it. Of course, NORECATALOG is the default. Abbreviations are respectively RCTLG and NRCTLG. If possible, the original define should be run with this parameter added. While not necessary, it's simply easier to do so. OPTIONAL: Defining a VSAM File with JCL A user can create a VSAM file in their JCL. The DSN= field is the cluster name. The following parameters help you: • RECORG=(KS, ES, RS, LS) for (respectively) KSDS, ESDS, RRDS, LDS; • KEYLEN=n - used with the next parameter, provides key information; • KEYOFF=n - used with KEYLEN, this is the offset of the key; • LIKE=dsn - works like MODEL. The following differences exist: • LRECL=n - the maximum (not average) record length; • AVGREC=(U, K, M) - scale modification of units, thousands, or millions of records when determining space in SPACE=({avg. rec. len.} ({pu}, {su})). For example, SPACE=(80,(10,1)) with AVGREC=K actually provides for a space argument of (80,(10000,1000)).
  • 43. 43 Defining an Alternate Index (AIX) In many ways, defining an alternate index is similar to defining a base cluster. Only those parameters different from the base cluster parameters are listed here, unless the parameter has a different application in a base cluster versus an AIX. RELATE ( ... ) specifies the base cluster to be accessed using the AIX. The base cluster cannot have the REUSE attribute and must be a KSDS or an ESDS. KEYS (n n) basically the same as in the base cluster define, except the AIX key cannot start in the same byte as the primary key. However, the key can be a subset of the primary key if desired. RECORDSIZE (n n) - the same as in the base cluster define, with additional considerations. If the file is defined as UNIQUEKEY, "average" should equal "maximum." If the file is defined as NONUNIQUEKEY, the value of "average" must be less than the value given for "maximum." Alternate indexes are automatically defined with the SPANNED attribute (although it can't be explicitly coded) to accommodate non-unique keys. In a NONUNIQUEKEY file each record has a 5-byte control field: Byte 1: x'00' if the base is a ESDS, x'01' if the base is a KSDS; Byte 2: length of the base cluster pointers in an AIX record. Values are x'04' if the base is an ESDS, x'length' if the base is a KSDS; Bytes 3 and 4: the number of occurrences of the base pointers within the record. For a UNIQUEKEY AIX this number is always x'0001'. Since a 2-byte field allows positive numbers up to x'7FFF', you can see the maximum number of occurrences of base cluster pointers is 32,767. Two examples follow. SSN / Employee number AIX: a unique relationship. The record has five bytes of control information, a 9-byte SSN, and a single employee number field of, say, five bytes for a total AIX record length of 19 bytes. Employee Number / Department AIX: a non-unique relationship. Further, say each department can have at most 50 employees. The record has its five bytes of control information, the department field (say 3 bytes), and (assuming the same 5 bytes of employee number bytes), up to 50 * 5, or 250, bytes of employee number information. The maximum record length would thus be 258 bytes.
  • 44. 44 Defining and AIX: Records and Keys (cont'd) Obviously, it is possible with a non-unique key to "run out of room" when adding base cluster records. Suppose, as in the previous example, you have defined a maximum record size of 258 bytes to your AIX. Tomorrow your boss may decide to expand a department to more than 50 people. What happens to your AIX? "Extra" Keys During a Load Normally a base cluster is loaded, followed by a build of an alternate index. During the first build of the AIX, if a given alternate key should generate more "entries" than fit in the AIX, VSAM flags it with an error message. VSAM then saves the existing AIX record moves on to the next alternate key. The result is a valid AIX with some base cluster record pointers missing. The fix is to drop the AIX, fix the define, and then rerun the build. It is unnecessary to rerun the load of the base cluster. "Extra" Keys During an Update In an established base cluster / AIX set, if an update would result in the AIX record "running out of room," the entire update is cancelled. In other words, not only does the AIX update fail, so does the base cluster's update. That only happens when the UPGRADE option is specified. If the AIX is defined with NOUPGRADE, the update continues. Defining an AIX: the Upgrade Parameter UPGRADE / NOUPGRADE specifies whether the alternate index is updated whenever the base cluster is updated. Abbreviations are respectively UPG and NUPG. The default is UPGRADE. If UPGRADE is in use, VSAM automatically opens the AIX when the base cluster is opened. If a change made to the base cluster would affect the AIX, the AIX update takes effect before the base cluster update is completed. With UPGRADE the user can be assured the AIX and base cluster are accurate and in sync with each other. There is no other way to do so except though the BLDINDEX AMS command. Further, the UPGRADE does not take effect until the AIX has been built. AIXes are very expensive. Any access to the base cluster through its AIX requires twice as many I/O to access a given record. This should not discourage you from using AIXes, but be aware of the costs.
  • 45. 45 Defining an AIX: DEFINE PATH Although there are several parameters on the DEFINE PATH statement, only these are useful to you in our current environment: NAME ( ... ) is the name by which you access the alternate index. PATHENTRY ( ... ) specifies the name of the AIX to be accessed. If this argument is supplied for a base cluster without an AIX it functions as a "create alias" command. The abbreviation is PENT. UPDATE / NOUPDATE - specifies whether the AIX is to be maintained using UPGRADE. Note this overrides the UPGRADE parameter. In other words, if the path is defined as NOUPDATE the AIX is not maintained even if it was defined with UPGRADE. The default is UPDATE. The parameters can be abbreviated respectively as UPD / NUPD. RECATALOG / NORECATALOG specifies whether the define path is being done to re-synchronize the catalog entries. Building an AIX: the BLDINDEX Command The BLDINDEX command creates and loads an AIX from an existing base cluster. Several parameters of BLDINDEX are not in common use and are not discussed here. During a build of an AIX the following operations occur: • The base cluster is read sequentially; • The key and alternate key are extracted from each base cluster record; • At EOF in the base cluster, the AIX has a set of records which are not in alternate key order. These records must be sorted before they are loaded; VSAM performs this sort; • Once the records are sorted, they are written to the AIX. As each record is written, the control field is added at the front of the record. BLDINDEX parameters INFILE ( ... ) or INDATASET ( ... ) specify the base cluster DDname or DSN used to build the AIX. One or the other must be given. Abbreviations are IFILE or IDS. When INDATASET is used it needs exclusive access to the base cluster. Unless a BLDINDEX is run against an initialized base cluster (i. e. one "seed" record), it is highly recommended INFILE be used so you can code AMP buffers in the JCL. The base cluster named in the statement must have been specified in the RELATE parameter in the AIX's define. The base cluster and the AIX must be in the same catalog.
  • 46. 46 BLDINDEX Parameters (cont'd) OUTFILE ( ... ) / OUTDATASET ( ... ) specify the AIX being built. OUTFILE requires a DDname, OUTDATASET requires the file's DSN. Abbreviations are OFILE and ODS respectively. In general, the same restrictions for INFILE / INDATASET apply to OUTFILE / OUTDATASET. OUTFILE is recommended so you can code AMP buffers. EXTERNALSORT / INTERNALSORT - specify whether VSAM uses an internal sort (virtual storage) or external sort (disk work space) to sort the AIX records. Abbreviations are ESORT and ISORT. The default is ISORT. With small files ISORT isn't a problem. To calculate the amount of storage needed: • Calculate the AIX key length plus base key length. When the base cluster is an ESDS the base key length is 4. When the base cluster is a KSDS the base key length is the KSDS key length; • Multiply that number by the number of base cluster records; • Add 25% for internal sort work space; • Round the answer up to the nearest 32K multiple. The answer, plus about 170K for IDCAMS processing, plus any BUFND and BUFNI parameters, tells what is needed for an internal sort. This number can be surprisingly large. Consider a 5-byte base key, a 9-byte alternate key, 42,000 records, and UNIQUEKEY. The space needed is 14 bytes (9+5) times 42,000 records = 588,000, times 1.25 (additional 25%) = 735,000. Round that to the nearest 32K and you get 736K. Add the 170K bytes BLDINDEX uses, plus buffers you have allocated (assume 1 cylinder's worth of data buffers on a 4K CI size for both base cluster and AIX, or 2880K), and the working storage you need (the REGION parameter) would be 3786K, or 3.7 megabytes. If you had 100,000 records, this number jumps to over 4.66 megabytes. Fortunately, in today's environment most jobs get 10 megabytes for virtual storage, so unless the AIX is huge the build uses an internal sort (and run much quicker for it). Just in case, you should note the BLDINDX command creates its own sort work files. The JCL statements IDCUT1 and IDCUT2 are needed to provide DASD allocations to do so. If the BLDINDX works successfully VSAM deletes them for you. A word of warning: if the BLDINDX fails, these work files remain cataloged and you must manually delete them. Because of that it's a good idea to name them explicitly in their DD statements.
  • 47. 47 Maintaining VSAM files: the ALTER Command Many characteristics of VSAM files can be changed "on the fly" with the AMS ALTER command. There are three main considerations: • The correct object must be specified in an entryname argument. Many attributes are valid at only index or only data levels, or both separately. For example, if you're renaming a VSAM KSDS you must rename the cluster, data, and index components separately; • Some characteristics can only be changed at specific times in a VSAM file's lifetime. For example, KEYS can be altered only when the data component of a file is empty; • Multiple components and files can be selected using generic names. An asterisk (*) can be substituted for a single qualifier. The high-level qualifier must be specified, and only one level can be substituted with the "*". Most of the parameters ALTER can change have the same meaning as they do in the define. In some cases the parameters have a slightly different meaning in the ALTER, and there are a few new ones. Only those parameters which are differ in function from those in the DEFINE is discussed here. In addition, ALTER parameters applicable only to catalogs are left out. For a complete list, see Appendix F. ADDVOLUMES( ... ) - abbreviated as AVOL, this adds candidate volumes to the list of possible locations for extents. Under SMS, a "*" stands for one unspecified volume. This ALTER can be run at any time. FREESPACE - can be run at any time. KEYS - can be altered only if: • The file is empty (HURBA is 0); • The current value for KEYS is (64 0); • The key description must fit within the record. Additionally, if the KEYS parameter is for an AIX, it is the AIX key and not the base cluster key definition that is changed. LIMIT( ... ) - for GDGs, the maximum number of generations to keep. MANAGEMENTCLASS - results in migration of the file through normal SMS processing.
  • 48. 48 Maintaining VSAM files: the ALTER Command Parameters (cont'd) NEWNAME ( ... ) - specifies a new name to be attached to a component. As mentioned above, if a VSAM file is to be renamed a separate ALTER is needed for each component. Note: a file cannot be renamed to an alias in a different catalog than the one it's already in. NULLIFY ( ... ) - removes a protection attribute. Attributes you can nullify are AUTHORIZATION, CODE, CONTROLPW, EXCEPTIONEXIT, MASTERPW, OWNER, READPW, RETENTION, and UPDATEPW. RECORDSIZE (n n) - may be altered only if: • The file is empty (HURBA is 0); • The current value for RECSZ is (4089 32600); • The new maximum record size must be at least 7 bytes less than the current CI size and large enough to contain the entire key; • If the object being altered is an AIX, the maximum size must be large enough to contain all base key pointers to be built. REMOVEVOLUMES ( ... ) - abbreviated RVOL, removes candidate volumes from the named object. This can also be used in such a way as to remove all VSAM files (including the VVDS) from a given volume by specifying the master catalog as the object named. Since that second function is extremely dangerous, don't do it! TYPE(LINEAR) - specified for an ESDS with a 4096 CI size, changes that object to an LDS. This change is irreversible. The parameter is ignored for any other type of object. UNIQUEKEY / NONUNIQUEKEY - has the same meaning as in a define with the following restrictions: • UNIQUEKEY can be specified only for an empty AIX (before BLDINDEX); • NONUNIQUEKEY can be specified at any time. Be sure the maximum record size is large enough to handle the multiple keys.
  • 49. 49 Some Examples of ALTER Commands Free space alteration. Note the following: • The ALTER command must specify the component being altered. An error message is issued for an ALTER FSPC on the cluster component, even if the original define had FSPC in the cluster define. • The new free space does not take effect until the next load or resume load, as: REPRO INFILE(INFILE) OUTFILE(OUTFILE) COUNT(5000) ALTER HLQ.2LQ.VSAM.DATA FSPC(10 5) REPRO INFILE(INFILE) OUTFILE(OUTFILE) SKIP(5000) COUNT(5000) ALTER HLQ.2LQ.VSAM.DATA FSPC(20 10) REPRO INFILE(INFILE) OUTFILE(OUTFILE) SKIP(10000) Adding a candidate volume: ALTER HLQ.2LQ.VSAM.DATA AVOL(* *) Renaming a cluster and its components. This can be done to non-VSAM as well as VSAM files. When renaming a VSAM file, you should rename not only the cluster but also the individual components: ALTER HLQ.2LQ.VSAM NEWNAME(HLQ.2LQ.NEWLQ.VSAM) ALTER HLQ.2LQ.VSAM.DATA NEWNAME(HLQ.2LQ.NEWLQ.VSAM.DATA) ALTER HLQ.2LQ.VSAM.INDEX - NEWNAME(HLQ.2LQ.NEWLQ.VSAM.INDEX) ALTER HLQ.2LQ.NONVSAM NEWNAME(HLQ.2LQ.NEWLQ.NONVSAM)
  • 50. 50 Deleting VSAM Files There is more to deleting VSAM files than simply typing "delete <file>". For example, many VSAM files have more than one component and thus more than one catalog entry. It is therefore possible (but not desirable!) to delete only part of a VSAM file. We'll look at all the parameters of the delete command, but will go into detail on only a few. These are the objects you can delete with AMS: ALIAS, AIX, CLUSTER, GDG, NONVSAM, NVR, PAGESPACE, PATH, TRUENAME, USERCATALOG, and VVR. Those objects you may not recognize are: • NVR: in a VVDS, the pointer to a non-VSAM file; • VVR: in a VVDS, the pointer to a VSAM file; • TRUENAME: an entry in a BCS for a data or index component. Used only if the original cluster's catalog entry is inaccessible. The following options can be specified: ERASE / NOERASE - optionally overwrites the file's data space with binary zeroes. Erase should be used only for files protected by something other than RACF, as the erase-on-delete function of RACF is faster. FILE ( ... ) - names the JCL DDname providing allocation of the volume where the file is located. If this is not specified, AMS attempts to dynamically allocate the file. Access to the volume is required when deleting: • a VVR with no cluster sphere record; • with the ERASE parameter; • objects from a recoverable catalog; • a volume owned by a catalog being deleted with FORCE; • volumes containing non-VSAM files with SCRATCH; • a member in a PDS. FILE is not needed if the file is actually cataloged to the named volume. If FILE is specified, the DD statement should have only DDname, UNIT=, DISP=, and VOL=SER= parameters. If multiple volumes are needed and they're the same device type, they should be specified on the same DD statement. If they are of different types, a concatenated DD statement is needed.
  • 51. 51 Deleting VSAM Files (cont'd) FORCE / NOFORCE - deletes GDGs without checking to verify they are empty. For example, if you try to delete a GDG base with cataloged generations the delete fails. To delete a GDG base with cataloged generations you must specify FORCE, which deletes the generations first, then the base. PURGE / NOPURGE - specifies whether to delete the file regardless of any expiration date. PURGE always deletes the file; NOPURGE returns a high condition code if the expiration date has not been exceeded. RECOVERY / NORECOVERY - used for deleting a BCS catalog. SCRATCH / NOSCRATCH - specifies whether the file is physically removed from any volume it reside(s) on. This has three distinct uses: • When deleting a cluster or AIX, NOSCRATCH specifies the entry is to be removed from the catalog only, without physically deleting the object from the volume. This would be used if the file does not actually exist, but the catalog records pointing to it do exist; • When deleting non-VSAM files, NOSCRATCH specifies only the catalog entry would be removed, not the VTOC entry; • After a volume recovery, NOSCRATCH would be used to delete catalog entries having no corresponding VVDS entries on that volume. In most cases, a delete should be coded like this: DELETE <filename> If you have a need of deleting one or more AIXes without also deleting the associated base clusters you can code your delete like this example: DELETE <filename> ALTERNATEINDEX Frequently, in a define you would want to delete the file first. This delete is often coded as: DELETE <filename> PURGE CLUSTER The CLUSTER in this case is optional if <filename> is a cluster name.
  • 52. 52 Verifying VSAM objects The VERIFY function synchronizes a VSAM file's allocation data with its catalog entry. The two can become out of sync if a CLOSE of the file fails. VSAM maintains information in the catalog describing each file's end-of-file information, including the High-Used-RBA (HURBA), High-Allocated-RBA (HARBA), end-of-key range (EOKR), and High-Key-RBA fields. During file processing these fields are updated in the in-storage copy of the catalog records. These fields are not written back to the catalog until the file is properly closed. If anything happens to the file before the user issues a CLOSE, the correct information is not written back to the catalog. VSAM catalog management ensures the failure to properly close a file won't go unnoticed. When a file is opened for output or update, VSAM flips a bit in the VVDS to signal the file is being updated. The bit remains on until a CLOSE is issued and completed. If a failure occurs and this bit is left on, when another attempt is made to access the file as output or update the OPEN needs to ensure the file is in satisfactory condition before it proceeds. This insurance is the VERIFY. There are three uses of the term VERIFY, each with small differences in operations logic and whether they actually update the catalog or just the in- storage control blocks. The user should take note of the proper conditions for each VERIFY and to use the same VERIFY logic each time for a given file. Verify from an Assembler Program An Assembler program can invoke the VSAM VERIFY macro to ensure the program knows the correct end-of-file location in the file. This VERIFY does not update the open-for-update bit, and does not write this information to the catalog unless the file has been opened for output / update AND the file is being closed. Verify as an IDCAMS Command (Explicit Verify) The second way to VERIFY a file is by an explicit IDCAMS command. The VERIFY command first issues an open for update. If the open-for-update bit is on, a VERIFY macro is issued. Finally, a CLOSE is issued. This process verifies the HURBA and, if the file is a KSDS, the High-Key RBA and the number of index levels. If there is a discrepancy between the file and the catalog, VERIFY assumes the file is correct and update the catalog accordingly. At CLOSE time the open-for-update bit is turned off. If the open-for-update bit was on at the start of the VERIFY, AMS returns a non-zero return code. If the bit was off VERIFY returns a 0, meaning the VERIFY was not needed.
  • 53. 53 The Implicit VERIFY VSAM OPEN processing automatically invokes an implicit VERIFY if, during OPEN processing, the open-for-update bit is found unaccountably turned on. VSAM considers the bit on unaccountably if it finds no ENQs on the file. An ENQ means the file is open for update elsewhere, which would be the case if the bit is on. This works the same as for a call from an Assembler program, i. e. the VERIFY does not update the open-for-update bit and does not write the information to the catalog unless the file has been opened for output / update AND the file is being closed. For practical purposes, if a file is left in a VERIFY- able state, it may remain in that state for a string of jobs, each performing their own implicit VERIFY until the file is actually opened for update and is then successfully closed. VERIFY Function and Syntax VERIFY can be used to resynchronize clusters, alternate indexes, and catalogs. Paths for AIXes cannot be verified. While it is possible to verify data and index components separately, it results in timestamps being different, thus causing problems at OPEN time. VERIFY cannot work on an LDS. Finally, VERIFY fails on an empty KSDS (i. e. its HURBA is 0). VERIFY has two calls: VERIFY FILE ( ... ) - specifies the DD name of the file to be verified; VERIFY DATASET ( ... ) - specifies the DSN to be verified. No JCL statement is needed for this call. During this type of VERIFY, the file is allocated as DISP=OLD. If SHR is required you must use FILE. DATASET can be abbreviated as DS. During OPEN processing, VSAM checks both the DISP field and the file’s SHAREOPTIONS. The file is allocated only if the DISP allows it. If allocation fails the file is not opened, a message is issued, and the job terminates. Once the DISP is satisfied, SHAREOPTIONS determine if the file can be open as requested: • SHR(1 3): if the file is open elsewhere as read, no update is allowed. If the program is allowed to open the file as update, no other reads is allowed. • SHR(2 3): Only one job may open the file as UPDATE but allows other opens for READ. During initial load this SHR is treated as SHR(1 3). • SHR(3 3): allows all opens. As above, during initial load this SHR is treated as SHR(1 3). SHR(3 4) is treated the same as SHR(3 3).
  • 54. 54 A Last Word on VERIFY Many people (and even many VSAM documents) suggest you VERIFY each time an update is to take place. That could be dangerous, as could ignoring a non-zero condition code issued by an implicit VERIFY. Remember, a VERIFY means the file had its open-for-update bit on. Generally that happens because of an ABEND or system crash. From that it follows the file may well be "broken" from an application perspective, perhaps missing records or data fields in certain records. It's a good practice to check the preceding batch to determine if the last job updating the file really updated correctly. Printing VSAM Files The AMS command PRINT creates an output listing of the contents of a VSAM file, catalog, or non-VSAM file. The parameters of PRINT are: INFILE / INDATASET - abbreviated IFILE and IDS. Specifies either the DDname or the DSN of the file to be printed. For a KSDS, naming the base cluster prints the records in key order. Specifying the PATH name for an AIX prints the records in alternate key order. If a KSDS data component is specified it is printed as if it were an ESDS (i. e. physical order); if the index component is specified only index records is printed (again, in physical order). OUTFILE ( ... ) - specifies the DDname to send the output to. The DCB is "RECFM=VBA,LRECL=125". CHARACTER / DUMP / HEX - specifies the output format. CHAR is the abbreviation for CHARACTER. CHAR tries to print each record in EBCDIC format, with non-printable characters appearing as dots (.). HEX prints the file byte-by-byte, in hexadecimal. No attempt is made to translate the print into readable characters. DUMP prints the output in "dump" format, i. e. a hexadecimal representation on the left and a character representation on the right. FROMKEY ( ... ) / FROMADDRESS ( ... ) / FROMNUMBER ( ... ) / SKIP ( ... ) indicate a starting position in the file. If none of these are specified, processing starts with the first logical record. FROMKEY cannot be used on non-keyed files, while FROMNUMBER is for RRDSes only. SKIP should not be used when printing through a PATH, since the results are unpredictable. TOKEY ( ... ) / TOADDRESS ( ... ) / TONUMBER ( ... ) / COUNT ( ... ) work in conjunction with the "FROM" parameters. These specify where to stop printing. As such, the restrictions are the same. Do not use COUNT when printing through a PATH, since the results are unpredictable.
  • 55. 55 Backing Up and Recovering VSAM Files One of the most important considerations for any access method is the approach used to back up the file's records, and to restore those records if the file breaks. There are many programs for performing backup and restore functions. Prior to DFP, IBM-supplied programs such as IEHDASDR or IEHMOVE were available to back up or restore full volumes or individual files. These were slow and had obtuse control statement formats, so for many years the program of choice for backup and recovery was FDR. With the advent of DFP, the IBM product DFDSS has replaced FDR in most installations. Additionally, there are AMS commands similar in function to the old IEH* programs. These commands are REPRO, EXPORT, and IMPORT. REPRO is the primary AMS command for copying data sets. It can copy both VSAM and non-VSAM. Since it's easier to use than IMPORT and EXPORT it is the command of choice to handle backing up VSAM files (if you're not using DFDSS, of course). REPRO's process is easy to understand: • Open the input file and determine the access method to use; • Open the output file and determine its access method. In addition, REPRO determines if there are records present. If there are, the "incoming" records are merged with them. During the copy the records are deblocked from the input file and reblocked for the output file. REPRO cannot read or write records larger than 32,760 bytes. The following types of copies can be done with REPRO: • ESDS to ESDS, KSDS, RRDS, non-VSAM sequential; • KSDS to ESDS, KSDS, RRDS, non-VSAM sequential; • RRDS to ESDS, KSDS, RRDS, non-VSAM sequential; • LDS to LDS, non-VSAM sequential; • non-VSAM sequential to ESDS, KSDS, RRDS, non-VSAM sequential. In addition, REPRO can copy, merge, or split ICF catalogs. If the output is a KSDS, reorganization takes place automatically. That means the output KSDS is initially loaded, with consideration given to free space, and all records is written in ascending key order. The input records must be sorted in key order with no duplicates.
  • 56. 56 REPRO Function and Syntax INFILE ( ... ) / INDATASET ( ... ) - abbreviated by IFILE and IDS. The INFILE argument is the JCL DDname of the input file. INDATASET names the DSN and does not require a JCL statement. The input file can be either a VSAM file, a non-VSAM file, a PATH over an AIX or cluster, or an ICF catalog. For performance purposes, unless the file is very small INFILE should be used, as INDATASET does not allow you to use buffers. OUTFILE ( ... ) / OUTDATASET ( ... ) - abbreviated by OFILE and ODS respectively. The OUTFILE argument is the JCL DDname of the output file. OUTDATASET names the DSN and does not require a JCL statement. This file can be either a VSAM file, a non-VSAM file, a PATH over an AIX or cluster, an ICF catalog (if the input is an ICF catalog), or SYSOUT (INFILE only). For performance purposes, unless the file is very small OUTFILE should be used, as OUTDATASET does not allow you to use buffers. ENTRIES ( ... ) / LEVEL ( ... ) - used for merging or copying an ICF catalog. FILE ( ... ) - specified when merging ICF catalogs. FROMKEY ( ... ) / FROMADDRESS ( ... ) / FROMNUMBER ( ... ) / SKIP ( ... ) indicate a starting position in the file. If none of these are given processing starts with the first logical record. FROMKEY cannot be used on non-keyed files, while FROMNUMBER is for RRDSes only. SKIP is the only parameter usable with non-VSAM files. SKIP should not be used when using a PATH, since the results are unpredictable, and cannot be used with an LDS. MERGECAT / NOMERGECAT - used only for ICF catalogs. REPLACE / NOREPLACE - indicates whether a record in the input is to replace the corresponding record in the output. These parameters work when the repro is a merge, i. e. the target file has records in it. They are ignored if the target file has no records (HURBA is 0). In a MERGE a KSDS is not reorged. In a KSDS merge the record is replaced if its keys are the same. In an RRDS the record replaces the target record slot, whatever its prior contents. If NOREPLACE is specified and a source record has the same key as a target record, a message is issued. If four such messages occur, REPRO terminates. REPLACE cannot be used if the target file is a PATH over an AIX, or is a base KSDS with a UNIQUEKEY AIX defined to it and using UPGRADE.
  • 57. 57 REPRO Function and Syntax (cont'd) REUSE / NOREUSE - REUSE specifies the VSAM target file is opened as a reusable file, provided it was defined as REUSE. If the target file was not defined as REUSE this parameter works only if the target is empty (i. e. HURBA = 0). Otherwise the REPRO fails with an error message. If a KSDS file is opened as REUSE it is treated as an initial load. With NOREUSE the file is opened as not reusable, regardless of the way the target file was defined. If the target has records in it, the REPRO acts as a merge, allowing REPLACE / NOREPLACE to take over. This is the default condition. TOKEY ( ... ) / TOADDRESS ( ... ) / TONUMBER ( ... ) / COUNT ( ... ) work in conjunction with the "FROM" parameters. These specify where to stop printing. As such, the restrictions are the same. Do not use COUNT when printing through a PATH, since the results are unpredictable. The COUNT parameter is the only one in this group usable with non-VSAM files. A Final Word in this Document A companion to this document is discussion of the AMS LISTCAT command, including a description of its output and how to interpret it. The LISTCAT command can help you tune your VSAM files once they've been used for a while. There is no possible way to predict accurately how a file looks after several months of use. The LISTC is the only way of verifying you've done your design work properly. It's also the only way to solve certain problems. Use these documents wisely and your VSAM files will remain happy and healthy.