The document discusses dataset naming standards for system-Z. It recommends a standard that uniquely identifies the owning application, whether the data is for production or test, and the type of data. This standard provides clarity, ease of security, automation, and storage management. The main disadvantage is that transitioning legacy systems to this standard requires changing dataset names wherever they occur.
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
White Paper, System Z Dataset Naming Standards
1. System-Z Dataset Naming Standards
Business data comes in many forms. It is found in program libraries, databases,
extracts from databases, files created from manipulating database data, unloads, backups,
access logs, stored procedures, and so on. Controlling access to business data seems
daunting, if not impossible. If we remember all business data (except printed reports)
exists as datasets on some form of electronic media, the task becomes manageable, even
relatively easy. The key is a good dataset naming standard.
A proper dataset naming standard has these features.
The HLQ is unique, not only to the application but also to the data’s purpose
within that application.
It describes the owning application and indicates the usage of the data.
It indicates whether the data is production or test.
The second qualifier describes the dataset uniquely.
The third qualifier describes the type of data on the dataset. In other words, is it a
database, an unload, a log, or some other dataset?
There are two exceptions to these rules. The first exception is DB2, which has its own
naming standard. However, DB2’s standard is easily amenable to the general rules. The
second exception is the existence of “working copies” of databases, i.e. a copy of the
database for problem investigation, utility testing, and other purposes.
Database versus Non-database Business Data
In almost all cases, database datasets already have a form of naming standard.
Whether this is a DD-name based standard (IMS) or a table space standard (DB2), there
are certain rules inherent in the DBMS. Non-database datasets, however, have no such
limits and may carry any name the IT person can think of. Non-database datasets come
in two forms: “control” datasets (procedures, parameters, and programs) and “data”
datasets (everything else). Because batch jobs create “data” datasets specific to the job, it
makes sense to use the creating batch job name as the dataset’s unique descriptor (i.e. the
second qualifier). For “control” datasets, the second qualifier may be anything describing
the library’s purpose as long as it is unique within the high-level qualifier.
A Note on Other Mainframe Naming Standards
There are many IBM mainframes, running a variety of operating systems. One
example is the VM LINUX virtual machines. Of course, naming standards and even file
storage are different for each of these operating systems. Because Linux and other
variations of UNIX work on a file system with both directories and files, it is not critical
to name the owner in the file name. The owner can be determined from the directory
and/or disk where the file is located, thus the file name can be more descriptive of its
contents.
2. An “Ideal” Dataset Naming Standard for zOS
HLQ:
o Code a “P” for production, anything else for test.
o Code the chargeback code for the owning application. This typically ranges
from two to four characters.
o Code IMS for IMS databases and related datasets, VSM for VSAM databases
and related datasets, and DB2 for DB2 databases. Related datasets include
backups, unloads, and copies. Applications may use any other character set
for non-database datasets.
The second qualifier is the DD name of the database dataset.
The third qualifier is a data type indicator, UNLOAD, or COPYx (where “x”
describes the use of the copy, e.g. library or vault).
Working copies of databases must have a “fourth” qualifier to show the date of
the copy. This qualifier is a “J” followed by the Julian date of the backup.
Examples
The examples here assume a two-character chargeback code.
o Production Amalgamated Assurance claims database (HDAM):
PAAIMS.DDCLM01D.OSAM OSAM dataset, first partition
PAAIMS.DDCLM01D.J2007153.OSAM working copy of the database
PAAIMS.DDCLM01D.UNLOAD unload dataset
PAAIMS.DDCLM01D.COPYL backup of database, library copy
PAAIMS.DDCLM01D.COPYV backup of database, vaulted tape
o Production Amalgamated Assurance billing code database (HIDAM):
PAAIMS.DDBILCDD.OSAM database DSN, data dataset
PAAIMS.DDBILCDD.KSDS database DSN, index dataset
PAAIMS.DDBILCDD.UNLOAD unload dataset for CLAIM first partition
PAAIMS.DDBILCDD.COPYL backup of database, library copy
PAAIMS.DDBILCDD.COPYV backup of database, vaulted tape
o Test Amalgamated Assurance claims database (HDAM):
TAAIMS.DDCLM01D.OSAM database, OSAM dataset, first partition
TAAIMS.DDCLM01D.UNLOAD unload dataset for CLAIM first partition
TAAIMS.DDCLM01D.COPYL backup of database, library copy
TAAIMS.DDCLM01D.COPYV backup of database, vaulted tape
o Test Amalgamated Assurance futures database (DB2):
TAADB2.DSNDBC.TS00001A.I0001.A001
o Other Amalgamated Assurance production datasets:
PAAPARM.BATCH.PARMLIB
PAAPGM.GENERAL.LOADLIB
PAACLM.<jobname>.CLMRPT.G0012V00
PAACLM.<jobname>.CLMBAD.G0001V00
3. You may note the “other” production dataset names are self-explanatory, or nearly so.
4. Advantages and Disadvantages
There are enormous advantages to the naming standard spelled out above.
A. Clarity. The dataset name instantly identifies who owns it, what is in the dataset,
and what type of data it is.
B. Chargeback is easy because the chargeback control information is always in the
same place.
C. Security. Whether you have RACF, Top Secret, ACF2, or another security
package, the HLQ of the dataset is the owner of the protection rules. Placing the
application’s unique chargeback information in the HLQ leaves no doubt who
owns the security responsibility for the data. It also drastically reduces the
overhead and number of RACF objects needed to protect business data.
D. Automation. Automation tools can construct dataset names using a few simple
rules. The tools do not need to keep or search for extra data. That reduces CPU
cycles and reduces the storage required to maintain a list of backups.
E. Storage. This standard makes it very easy to code ACS routines, both for sending
datasets to various pools and for excluding datasets from pools. It reduces CPU
and simplifies ACS routines. Further, dataset allocation is faster with fewer
dataset screening criteria.
There is really only one disadvantage to this naming standard: it is an ideal. Starting
out with this ideal, or a similar naming standard, is something to try for. However, while
it is possible to change a legacy system’s dataset names, going from a very simple
naming standard to this standard can be a complicated process.
A. Many job changes. DSNs must change wherever they occur. Using a global find/
change utility simplifies the task, but it is still no small undertaking.
B. In IMS regions, we must change dynamic allocation members. There are
automated tools to do this.
C. Many new GDG bases. However, we can easily automate creating the new GDG
bases. A relatively simple REXX exec would generate the GDGs, RACF profiles,
and new vaulting lists very rapidly given a list of chargeback codes plus the
member list of the DBDLIB and a list of the DD names of VSAM databases.
D. Coordination. It is important to verify, before changing names, what applications
(OS as well as business applications) are affected. Converting one application at
a time lessens this disadvantage.
While there are technical difficulties, none of them is impossible to surmount. By far
the hardest task is convincing the application to make the change. It is easier to do so if
the application already has a good standard; in fact, if their standard follows the general
guidelines set out on the first page, there may be no need to change at all.