zycnzj.com/ www.zycnzj.com

 COMPUTING
 PRACTICES


                              A History and Evaluation
                                    of System R
                                  Donald D. Chamberlin                 Thomas G. Price
                                  Morton M. Astrahan                   Franco Putzolu
                                  Michael W. Blasgen                   Patricia Griffiths Selinger
                                  James N. Gray                        Mario Schkolnick
                                  W. Frank King                        Donald R. Slutz
                                  Bruce G. Lindsay                     Irving L. Traiger
                                  Raymond Lorie                        Bradford W. Wade
                                  James W. Mehl                        Robert A. Yost

                                                      IBM Research Laboratory
                                                        San Jose, California

1. Introduction
    Throughout the history of infor-
mation storage in computers, one of                SUMMARY: System R, an experimental database system,
the most readily observable trends
has been the focus on data indepen-
                                                   was constructed to demonstrate that the usability advantages
dence. C.J. Date [27] defined data                 of the relational data model can be realized in a system with
independence as "immunity of ap-                   the complete function and high performance required for
plications to change in storage struc-             everyday production use. This paper describes the three
ture and access strategy." Modern                  principal phases of the System R project and discusses some
database systems offer data indepen-               of the lessons learned from System R about the design of
dence by providing a high-level user
interface through which users deal
                                                   relational systems and database systems in general.
with the information content of their
data, rather than the various bits,
pointers, arrays, lists, etc. which are
                                                   representation for the information;       sented by connections between the
used to represent that information.
                                                   indeed, the representation of a given      relevant part and supplier records. In
The system assumes responsibility
                                                   fact may change over time without          such a system, a user frames a ques-
for choosing an appropriate internal
                                                   users being aware of the change.           tion, such as "What is the lowest
Permission to copy without fee all or part of          The relational data model was         price for bolts?", by writing a pro-
this material is granted provided that the cop-    proposed by E.F. Codd [22] in 1970        gram which "navigates" through the
ies are not made or distributed for direct         as the next logical step in the trend     maze of connections until it arrives
commercial advantage, the ACM copyright
notice and the title o f the publication and its
                                                   toward data independence. Codd ob-        at the answer to the question. The
date appear, and notice is given that copying      served that conventional database         user of a "navigational" system has
is by permission of the Association for Com-       systems store information in two          the burden (or opportunity) to spec-
puting Machinery. To copy otherwise, or to
republish, requires a fee and/or specific per-     ways: (1) by the contents of records      ify exactly how the query is to be
mission.                                           stored in the database, and (2) by the    processed; the user's algorithm is
Key words and phrases: database manage-            ways in which these records are con-      then embodied in a program which
ment systems, relational model, compilation,
locking, recovery, access path selection, au-      nected together. Different systems        is dependent on the data structure
thorization                                        use various names for the connec-         that existed at the time the program
CR Categories: 3.50, 3.70, 3.72, 4.33, 4.6         tions among records, such as links,       was written.
Authors' address: D. D. Chamberlin et al.,
IBM Research Laboratory, 5600 Cottle Road,
                                                   sets, chains, parents, etc. For exam-         Relational database systems, as
San Jose, California 95193.                        ple, in Figure l(a), the fact that sup-   proposed by Codd, have two impor-
© 1981 ACM 0001-0782/81/1000-0632 75¢.             plier Acme supplies bolts is repre-       tant properties: (1) all information is
632                                                Communications                            October 1981
                                               zycnzj.com/http://www.zycnzj.com/
                                                  of                                         Volume 24
                                                   the ACM                                   Number 10
zycnzj.com/ www.zycnzj.com
represented by data values, never by
any sort of "connections" which are
visible to the user; (2) the system
supports a very high-level language
                                                                                    FF
in which users can frame requests for
data without specifying algorithms
for processing the requests. The re-
lational representation of the data in
Figure l(a) is shown in Figure l(b).
Information about parts is kept in a
PARTS relation in which each record
has a "key" (unique identifier) called
PARTNO. Information about suppliers      SUPPLIERS
is kept in a SUPPLIERSrelation keyed
by SUPPNO. The information which
was formerly represented by connec-
tions between records is now con-
tained in a third relation, PRICES, in
which parts and suppliers are repre-
                                                                                                                   pcF
sented by their respective keys. The
                                       Fig. l(a). A "Navigational" Database.
question "What is the lowest price
for bolts?" can be framed in a high-
level language like SQL [16] as fol-
lows:
                                       required for everyday production           nisms to protect the integrity of the
SELECT MIN(PRICE)
FROM      PRICES                       use.                                       database in a concurrent-update en-
W H E R E PARTNO IN                         The key goals established for Sys-    vironment.
          (SELECT P A R T N O          tem R were:                                    (5) To provide a means of re-
          FROM      PARTS.
          W H E R E NAME = 'BOLT');
                                                                                  covering the contents of the database
                                            (1) To provide a high-level,
                                                                                  to a consistent state after a failure of
A relational system can maintain       nonnavigational user interface for
                                                                                  hardware or software.
whatever pointers, indices, or other   maximum user productivity and data
                                                                                      (6) To provide a flexible mech-
access aids it finds appropriate for   independence.
                                                                                  anism whereby different views of
processing user requests, but the           (2) To support different types
                                                                                  stored data can be defined and var-
user's request is not framed in terms  of database use including pro-
                                                                                  ious users can be authorized to query
of these access aids and is therefore  grammed transactions, ad hoc que-
                                                                                  and update these views.
not dependent on them. Therefore,      ries, and report generation.
                                                                                      (7) To support all of the above
the system may change its data rep-         (3) To support a rapidly chang-
                                                                                  functions with a level of performance
resentation and access aids periodi-   ing database environment, in which
                                                                                  comparable to existing lower-func-
cally to adapt to changing require-    tables, indexes, views, transactions,
                                                                                  tion database systems.
ments without disturbing users' ex-    and other objects could easily be
isting applications.                   added to and removed from the data-        Throughout the System R project,
    Since Codd's original paper, the   base without stopping the system.          there has been a strong commitment
advantages of the relational data           (4) To support a population of        to carry the system through to an
model in terms of user productivity    many concurrent users, with mecha-         operationally complete prototype
and data independence have become
widely recognized. However, as in
the early days of high-level program-
ming languages, questions are some-     PARTS                     SUPPLIERS            PRICES
times raised about whether or not an
automatic system can choose as ef-        PARTNO        NAME        SUPPNO    NAME       PARTNO      SUPPNO       PRICE
ficient an algorithm for processing a       P107       Bolt           $51    Acme         P107         $51           .59
complex query as a trained program-         P113       Nut            $57    Ajax         P107         $57           .65
mer would. System R is an experi-           P125       Screw          $63    Amco         P113         $51           .25
mental system constructed at the San        P132       Gear                               P113         $63           .21
                                                                                          P125         $63           .15
Jose IBM Research Laboratory to
                                                                                          P132         $57          5.25
demonstrate that a relational data-                                                       P132         $63        10.00
base system can incorporate the high
performance and complete function      Fig. l(b). A Relational Database.

633                                       Communications                            October 1981
                                          of                                        Volume 24
                                          the ACM                                   N u m b e r 10
                                zycnzj.com/http://www.zycnzj.com/
zycnzj.com/ www.zycnzj.com
                                              tional access method called XRM,            by the facilities ofXRM. XRM stores
COMPUTING                                     which had been developed by R.              relations in the form of "tuples,"
PRACTICES                                     Lorie at IBM's Cambridge Scientific         each of which has a unique 32-bit
                                              Center [40]. '(XRM was influenced,          "tuple identifier" (TID). Since a TID
                                              to some extent, by the " G a m m a          contains a page number, it is possi-
which could be installed and evalu-           Zero" interface defined by E.F.             ble, given a TID, to fetch the asso-
ated in actual user sites.                    Codd and others at San Jose [11].)          ciated tuple in one page reference.
    The history of System R can be            Since XRM is a single-user access           However, rather than actual data
divided into three phases. "Phase             method without locking or recovery          values, the tuple contains pointers to
Zero" of the project, which occurred          capabilities, issues relating to con-       the "domains" where the actual data
during 1974 and-most of 1975, in-             currency and recovery were excluded         is stored, as shown in Figure 2. Op-
volved the development of the SQL             from consideration in Phase Zero.           tionally, each domain may have an
user interface [14] and a quick im-                An interpreter program was writ-       "inversion," which associates do-
plementation of a subset of SQL for           ten in P L / I to execute statements        main values (e.g., "Programmer")
one user at a time. The Phase Zero            in the high-level SQL (formerly             with the TIDs of tuples in which the
prototype, described in [2], provided         SEQUEL) language [14, 16] on top            values appear. Using the inversions,
valuable insight in several areas, but        of XRM. The implemented subset              XRM makes it easy to find a list of
its code was eventually abandoned.            of the SQL language included que-            TIDs of tuples which contain a given
"Phase One" of the project, which             ries and updates of the database, as         value. For example, in Figure 2, if
took place throughout most of 1976            well as the dynamic creation of              inversions exist on both the JOB and
and 1977, involved the design and             new database relations. The Phase            LOCATION domains, XRM provides
construction of the full-function,            Zero implementation supported the            commands to create a list of TIDs of
multiuser version of System R. An             "subquery" construct of SQL, but             employees who are programmers,
initial system architecture was pre-          not its "join" construct. In effect, this    and another list of TIDs of employ-
sented in [4] and subsequent updates          meant that a query could search              ees who work in Evanston. If the
to the design were described in [10].         through several relations in comput-         SQL query calls for programmers
"Phase Two" was the evaluation of             ing its result, but the final result         who work in Evanston, these TID
System R in actual use. This oc-              would be taken from a single rela-           lists can be intersected to obtain the
curred during 1978 and 1979 and               tion.                                        list of TIDs of tuples which satisfy
involved experiments at the San Jose               The Phase Zero implementation           the query, before any tuples are ac-
Research Laboratory and several                was primarily intended for use as a         tually fetched.
other user sites. The results of some          standalone query interface by end                The most challenging task in con-
of these experiments and user expe-            users at interactive terminals. At the      structing the Phase Zero prototype
riences are described in [19-21]. At           time, little emphasis was placed on         was the design of optimizer algo-
each user site, System R was installed         issues of interfacing to host-language      rithms for efficient execution of SQL
for experimental purposes only, and            programs (although Phase Zero                statements on top of XRM. The de-
not as a supported commercial prod-            could be called from a P L / I              sign of the Phase Zero optimizer is
uct.1                                          program). However, considerable             given in [2]. The objective of the
    This paper will describe the de-           thought was given to the human fac-          optimizer was to minimize the num-
cisions which were made and the                tors aspects of the SQL language,            ber of tuples fetched from the data-
lessons learned during each of the             and an experimental study was con-           base in processing a query. There-
three phases of the System R project.          ducted on the learnability and usa-          fore, the optimizer made extensive
                                               bility of SQL [44].                          use of inversions and often manipu-
2. Phase Zero: An Initial Proto-                    One of the basic design decisions       lated TID lists before beginning to
type                                           in the Phase Zero prototype was that         fetch tuples. Since the TID lists were
    Phase Zero of the System R proj-           the system catalog, i.e., the descrip-       potentially large, they were stored as
ect involved the quick implementa-             tion of the content and structure of         temporary objects in the database
tion of a subset of system functions.          the database, should be stored as a          during query processing.
From the beginning, it was our inten-          set of regular relations in the data-            The results of the Phase Zero
tion to learn what we could from this          base itself. This approach permits the       implementation were mixed. One
initial prototype, and then scrap the          system to keep the catalog up to date        strongly felt conclusion was that it is
Phase Zero code before construction            automatically as changes are made            a very good idea, in a project the size
of the more complete version of Sys-           to the database, and also makes the          of System R, to plan to throw away
tem R. We decided to use the rela-             catalog information available to the         the initial implementation. On the
   1The System R research prototype later      system optimzer for use in access            positive side, Phase Zero demon-
evolved into SQL/Data System, a relational     path selection.                              strated the usability of the SQL lan-
database management product offered by                                                      guage, the feasibility of creating new
IBM in the DOS/VSE operating system en-
                                                    The structure of the Phase Zero
vironment.                                      interpreter was strongly influenced          tables and inversions "on the fly"

634                                           Communications                   October 1981
                                                                               Volume 24
                                              ofzycnzj.com/http://www.zycnzj.com/
                                              the ACM                          Number 10
zycnzj.com/ www.zycnzj.com
and relying on an automatic opti-
mizer for access path selection, and              Domain#1 : Names                         Domain# 3: Locations
the convenience of storing the system
catalog in the database itself. At the
same time, Phase Zero taught us a
number of valuable lessons which                       JohnSmith                                  Evanston
greatly influenced the design of our
later implementation. Some of these
lessons are summarized below.
     (1) The optimizer should take
into account not just the cost of
fetching tuples, but the costs of cre-
ating and manipulating TID lists,                                                            
then fetching tuples, then fetching
the data pointed to by the tuples.
                                                          T'D1                   /I
When these "hidden costs" are taken
into account, it will be seen that the                              ~ 2 :          Jobs
manipulation of TID lists is quite
expensive, especially if the TID lists
are managed in the database rather
than in main storage.                                                      Programmer
     (2) Rather than "number of tu-
pies fetched," a better measure of
cost would have been "number of
I/Os." This improved cost measure
would have revealed the great im-      Fig. 2. X R M Storage Structure.
portance of clustering together re-
lated tuples on physical pages so that
 several related tuples could be
 fetched by a single I/O. Also, an     subsequent implementation, both            and access path selection functions
 I/O measure would have revealed a     "joins" and "subqueries" were sup-         were isolated in the RDS. Construc-
serious drawback of XRM: Storing       ported.                                    tion of the RSS was underway in
 the domains separately from the tu-       (5) The Phase Zero optimizer           1975 and construction of the RDS
 pies causes many extra I/Os to be     was quite complex and was oriented         began in 1976. Unlike XRM, the
done in retrieving data values. Be-    toward complex queries. In our later       RSS was originally designed to sup-
 cause of this, our later implementa-  implementation, greater emphasis           port multiple concurrent users.
 tion stored data values in the actual was placed on relatively simple in-            The multiuser prototype of Sys-
 tuples rather than in separate do-    teractions, and care was taken to          tem R contained several important
 mains. (In defense of XRM, it should  minimize the "path length" for sim-        subsystems which were not present
 be noted that the separation of data  ple SQL statements.                        in the earlier Phase Zero prototype.
 values from tuples has some advan-                                               In order to prevent conflicts which
 tages if data values are relatively   3. Phase One: Construction of a            might arise when two concurrent
 large and if many tuples are proc-    Multiuser Prototype                        users attempt to update the same
 essed internally compared to the          After the completion and evalu-        data value, a locking subsystem was
 number of tuples which are materi-    ation of the Phase Zero prototype,         provided. The locking subsystem en-
 alized for output.)                   work began on the construction of          sures that each data value is accessed
      (3) Because the Phase Zero im-   the full-function, multiuser version       by only one user at a time, that all
 plementation was observed to be       of System R. Like Phase Zero, Sys-         the updates made by a given trans-
 CPU-bound during the processing of    tem R consisted of an access method        action become effective simultane-
 a typical query, it was decided the   (called RSS, the Research Storage          ously, and that deadlocks between
 optimizer cost measure should be a    System) and an optimizing SQL              users are detected and resolved. The
 weighted sum of CPU time and I / O    processor (called RDS, the Rela-           security of the system was enhanced
 count, with weights adjustable ac-    tional Data System) which runs on          by view and authorization subsys-
 cording to the system configuration.  top of the RSS. Separation of the          tems. The view subsystem permits
      (4) Observation of some of the   RSS and RDS provided a beneficial          users to define alternative views of
  applications of Phase Zero con-      degree of modularity; e.g., all locking    the database (e.g., a view of the em-
 vinced us of the importance of the    and logging functions were isolated        ployee file in which salaries are de-
  "join" formulation of SQL. In our    in the RSS, while all authorization        leted or aggregated by department).

635                                      Communications                            October 1981
                                         of                                        Volume 24
                                         the ACM                                   N u m b e r 10
                                   zycnzj.com/http://www.zycnzj.com/
zycnzj.com/ www.zycnzj.com
COMPUTING                                 SQL statements of arbitrary com-           base changes (e.g., an index is
                                          plexity could be decomposed into a         dropped), all affected access modules
PRACTICES                                 relatively small collection of ma-         are marked "invalid." The next time
                                          chine-language "fragments," and            an invalid access module is invoked,
The authorization subsystem ensures       that an optimizing compiler could          it is regenerated from its original
 that each user has access only to        assemble these code fragments from         SQL statements, with newly opti-
                                          a library to form a specially tailored     mized access paths. This process is
 those views for which he has been
specifically authorized by their cre-     routine for processing a given SQL         completely transparent to the System
ators. Finally, a recovery subsystem      statement. This technique had a very       R user.
                                          dramatic effect on our ability to sup-         SQL statements submitted to the
was provided which allows the data-
                                          port application programs for trans-       interactive UFI dialog manager are
base to be restored to a consistent
state in the event of a hardware or       action processing. In System R, a          processed by the same optimizing
software failure.                         P L / I or Cobol pi'ogram is run           compiler as preprocessed SQL state-
     In order to provide a useful host-   through a preprocessor in which its        ments. The UFI program passes the
                                          SQL statements are examined, opti-         ad hoc SQL statement to System R
language capability, it was decided
                                          mized, and compiled into small, ef-        with a special "EXECUTE" call. In re-
that System R should support both
P L / I and Cobol application pro-        ficient machine-language routines          sponse to the EXECUTEcall, System R
grams as well as a standalone query       which are packaged into an "access         parses and optimizes the SQL state-
interface, and that the system should     module" for the application pro-           ment and translates it into a ma-
run under either the V M / C M S or       gram. Then, when the program goes          chine-language routine. The routine
M V S / T S O operating system envi-      into execution, the access module is       is indistinguishable from an access
ronment. A key goal of the SQL            invoked to perform all interactions        module and is executed immediately.
language was to present the same          with the database by means o f calls       This process is described in more
capabilities, and a consistent syntax,    to the RSS. The process of creating        detail in [20].
to users of the P L / I and Cobol host    and invoking an access module is
                                          illustrated in Figures 3 and 4. All the        RSS Access Paths
languages and to ad hoc query users.
The imbedding of SQL into P L / I is      overhead of parsing, validity check-            Rather than storing data values
described in [16]. Installation of a      ing, and access path selection is re-      in separate "domains" in the manner
multiuser database system under           moved from the path of the execut-         o f XRM, the RSS chose to store data
V M / C M S required certain modifi-      ing program and placed in a separate       values in the individual rcords of the
cations to the operating system in        preprocessor step which need not be        database. This resulted in records be-
support of communicating virtual          repeated. Perhaps even more impor-         coming variable in length and
machines and writable shared virtual      tant is the fact that the running pro-     longer, on the average, than the
                                          gram interacts only with its small,        equivalent XRM records. Also, com-
memory. These modifications are de-
scribed in [32].                          special-purpose access module rather       monly used values are represented
                                          than with a much larger and less           many times rather than only once as
     The standalone query interface
of System R (called UFI, the User-        efficient general-purpose SQL inter-       in XRM. It was felt, however, that
Friendly Interface) is supported by       preter. Thus, the power and ease of        these disadvantages were more than
a dialog manager program, written         use of the high-level SQL language         offset by the following advantage:
in PL/I, which runs on top o f System     are combined with the execution-           All the data values of a record could
R like any other application pro-         time efficiency of the much lower          be fetched by a single I/O.
gram. Therefore, the UFI support          level RSS interface.                            In place of XRM "inversions,"
program is a cleanly separated com-            Since all access path selection de-   the RSS provides "indexes," which
ponent and can be modified inde-          cisions are made during the prepro-        are associative access aids imple-
pendently of the rest of the system.      cessor step in System R, there is the      mented in the form of B-Trees [26].
                                          possibility that subsequent changes        Each table in the database may have
In fact, several users improved on
our UFI by writing interactive dialog     in the database may invalidate the         anywhere from zero indexes up to an
managers of their own.                    decisions which are embodied in an         index on each column (it is also pos-
                                          access module. For example, an in-         sible to create an index on a combi-
      The Compilation Approach            dex selected by the optimizer may          nation of columns). Indexes make it
    Perhaps the most important de-        later be dropped from the database.        possible to scan the table in order by
cision in the design of the RDS was       Therefore, System R records with           the indexed values, or to directly ac-
inspired by R. Lorie's observation, in    each access module a list of its "de-      cess the records which match a par-
early 1976, that it is possible to com-   pendencies" on database objects            ticular value. Indexes are maintained
pile very high-level SQL statements       such as tables and indexes. The de-        automatically by the RSS in the
into compact, efficient routines in       pendency list is stored in the form of     event of updates to the database.
System/370 machine language [42].         a regular relation in the system cat-          The     RSS    also implements
Lorie was able to demonstrate that        alog. When the structure of the data-      "links," which are pointers stored

636                                       Communications                             October 1981
                                          zycnzj.com/http://www.zycnzj.com/
                                          of                                         Volume 24
                                          the ACM                                    N u m b e r l0
zycnzj.com/ www.zycnzj.com
                                                                                             temporary list in the database. In
         P L / I Source Program                                                              System R, the RDS makes extensive
               I                                                                             use o f index and relation scans and
               f                                                                             sorting. The RDS also utilizes links
               I                                                                             for internal purposes but not as an
         SELECT NAME INTO $)<                                                                access path to user data.
         FROM EMP
         WHERE EMPNO=$Y                                                                          The Optimizer
               I
                                                                                                 Building on our Phase Zero ex-
               I
                                                                                             perience, we designed the System R
               I
                                                                                             optimizer to minimize the weighted
                                                                                             sum of the predicted number of I/Os
                                                                                             and RSS calls in processing an SQL
                                                                                             statement (the relative weights of
                                                 SYSTEM R                                    these two terms are adjustable ac-
                                               PRECOMPILER                                   cording to system configuration).
                                                  (XPREP)
                                                                                             Rather than manipulating TID lists,
                                                                                             the optimizer chooses to scan each
                                                                                             table in the SQL query by means of
                                                                                             only one index (or, if no suitable
                                                                                             index exists, by means of a relation
                                                                                             scan). For example, if the query calls
                                                                                             for programmers who work in Ev-
                   Modified P L / I Program                  Access Module                   anston, the optimizer might choose
                        I                                                                    to use the job index to find program-
                        I                                                                    mers and then examine their loca-
                                                                Machine code
                                                                ready to run                 tions; it might use the location index
                      CALL
                                                                on RSS                       to find Evanston employees and ex-
                        I                                                                    amine their jobs; or it might simply
                        I
                                                                                             scan the relation and examine the
                                                                                             job and location of all employees.
Fig. 3. Precompilation Step.
                                                                                             The choice would be based on the
                                                                                             optimizer's estimate of both the clus-
                                                                                             tering and selectivity properties of
                                                                                             each index, based on statistics stored
User's Object                                      with a record which connect it to         in the system catalog. An index is
Program                                                                                      considered highly selective if it has a
                                                   other related records. The connec-
                                                   tion of records on links is not per-      large ratio of distinct key values to
        call                                       formed automatically by the RSS,          total entries. An index is considered
                                                   but must be done by a higher level        to have the clustering property if the
                    Loads,                         system.                                   key order of the index corresponds
Execution-time      then calls
   System
                                   Access              The access paths made available       closely to the ordering of records in
                                   Module                                                    physical storage. The clustering
   (XRDI)                                          by the RSS include (1) index scans,
                                                   which access a table associatively        property is important because when

                                      l call       and scan it in value order using an
                                                   index; (2) relation scans, which scan
                                                   over a table as it is laid out in phys-
                                                                                             a record is fetched via a clustering
                                                                                             index, it is likely that other records
                                                                                             with the same key will be found on
                                     RSS                                                     the same page, thus minimizing the
                                                   ical storage; (3) link scans, which
                                                   traverse from one record to another       number of page fetches. Because of
                                                   using links. On any of these types of     the importance of clustering, mech-
                                                   scan, "search arguments" may be           anisms were provided for loading
                                                   specified which limit the records re-     data in value order and preserving
                                                   turned to those satisfying a certain      the value ordering when new records
                                                   predicate. Also, the RSS provides a       are inserted into the database.
                                                   built-in sorting mechanism which              The techniques of the System R
                                                   can take records from any of the scan     optimizer for performing joins of two
Fig. 4. Execution Step.
                                                   methods and sort them into some           or more tables have their origin in a
                                                   value order, storing the result in a      study conducted by M. Blasgen and

637                                                Communications                            October 1981
                                                   of                                        Volume 24
                                               zycnzj.com/http://www.zycnzj.com/
                                                   the ACM                                   N u m b e r 10
COMPUTING                         zycnzj.com/ tree. When an SQL media may fail, the system may fail,
                                       an SQL parse www.zycnzj.com
                                             operation is to be executed against a       or an individual transaction may fail.
PRACTICES                                    view, the parse tree which defines          Although both the scope of the fail-
                                             the operation is merged with the            ure and the time to effect recovery
                                             parse tree which defines the view,          may be different, all three types o f
                                             producing a composite parse tree            recovery require that an alternate
K. Eswaran [7]. Using APL models,            which is then sent to the optimizer         copy of data be available when the
Blasgen and Eswaran studied ten              for access path selection. This ap-         primary copy is not.
methods of joining together tables,          proach is similar to the "query mod-             When a media failure occurs,
based on the use of indexes, sorting,        ification" technique proposed by            database information on disk is lost.
physical pointers, and TID lists. The        Stonebraker [48]. The algorithms de-        When this happens, an image dump
number of disk accesses required to          veloped for merging parse trees were        of the database plus a log o f " b e f o r e "
perform a join was predicted on the          sufficiently general so that nearly         and "after" changes provide the al-
basis of various assumptions for the         any SQL statement could be exe-             ternate copy which makes recovery
ten join methods. Two join methods           cuted against any view definition,          possible. System R's use of "dual
were identified such that one or the         with the restriction that a view can        logs" even permits recovery from
other was optimal or nearly optimal          be updated only if it is derived from       media failures on the log itself. To
under most circumstances. The two            a single table in the database. The         recover from a media failure, the
methods are as follows:                      reason for this restriction is that some    database is restored using the latest
                                             updates to views which are derived          image dump and the recovery pro-
    Join Method 1: Scan over the             from more than one table are not            cess reapplies all database changes
qualifying rows of table A. For each         meaningful (an example of such an           as specified on the log for completed
row, fetch the matching rows of table        update is given in [24]).                   transactions.
B (usually, but not always, an index             The authorization subsystem of               When a system failure occurs, the
on table B is used).                         System R is based on privileges             information in main memory is lost.
    Join Method 2: (Often used               which are controlled by the SQL             Thus, enough information must al-
when no suitable index exists.) Sort         statements GRANT and REVOKE.Each            ways be on disk to make recovery
the qualifying rows of tables A and          user of System R may optionally be          possible. For recovery from system
B in order by their respective join          given a privilege called RESOURCE           failures, System R uses the change
fields. Then scan over the sorted lists      which enables h i m / h e r to create new   log mentioned above plus something
and merge them by matching values.           tables in the database. When a user         called "shadow pages." As each page
    When selecting an access path for        creates a table, he/she receives all        in the database is updated, the page
a join of several tables, the System R       privileges to access, update, and de-       is written out in a new place on disk,
optimizer considers the problem to           stroy that table. The creator of a          and the original page is retained. A
be a sequence of binary joins. It then       table can then grant these privileges       directory of the "old" and "new"
performs a tree search in which each         to other individual users, and subse-       locations of each page is maintained.
level of the tree consists of one of the     quently can revoke these grants if          Periodically during normal opera-
binary joins. The choices to be made         desired. Each granted privilege may         tion, a "checkpoint" occurs in which
at each level of the tree include which      optionally carry with it the "GRANT         all updates are forced out to disk, the
join method to use and which index,          option," which enables a recipient to       "old" pages are discarded, and the
if any, to select for scanning. Com-         grant the privilege to yet other users.     "new" pages become "old." In the
parisons are applied at each level of        A REVOKE destroys the whole chain           event of a system crash, the "new"
the tree to prune away paths which           of granted privileges derived from          pages on disk may be in an incon-
achieve the same results as other, less      the original grant. The authorization       sistent state because some updated
costly paths. When all paths have            subsystem is described in detail in         pages may still be in the system
been examined, the optimizer selects         [37] and discussed further in [31].         buffers and not yet reflected on disk.
the one o f minimum predicted cost.                                                      To bring the database back to a con-
The System R optimizer algorithms                The Recovery Subsystem                  sistent state, the system reverts to the
are described more fully in [47].                The key objective of the recovery       "old" pages, and then uses the log to
                                             subsystem is provision of a means           redo all committed transactions and
      Views and Authorization                whereby the database may be re-             to undo all updates made by incom-
    The major objectives of the view         covered to a consistent state in the        plete transactions. This aspect of the
and authorization subsystems o f Sys-        event of a failure. A consistent state      System R recovery subsystem is de-
tem R were power and flexibility.            is defined as one in which the data-        scribed in more detail in [36].
We wanted to allow any SQL query             base does not reflect any updates               When a transaction failure o c -
to be used as the definition of a view.      made by transactions which did not          curs, all database changes which
This was accomplished by storing             complete successfully. There are            have been made by the failing trans-
each view definition in the form of          three basic types of failure: the disk      action must be undone. To accom-

638                                          Communications                              October 1981
                                             of                                          Volume 24
                                             the ACM                                     N u m b e r 10
                                           zycnzj.com/http://www.zycnzj.com/
zycnzj.com/ www.zycnzj.com
plish this, System R simply processes       "intention" locks are simultaneously     tal applications, although no speci-
the change log backwards removing           acquired on the larger objects which     fic performance comparisons were
all changes made by the transaction.        contain them. For example, user A        drawn. In general, the experimental
Unlike media and system recovery            and user B may both be updating          databases used with System R were
which both require that System R be         employee records. Each user holds        smaller than one 3330 disk pack (200
reinitialized, transaction recovery         an "intention" lock on the employee      Megabytes) and were typically ac-
takes place on-line.                        table, and "exclusive" locks on the      cessed by fewer than ten concurrent
                                            particular records being updated. If     users. As might be expected, inter-
    The Locking Subsystem                   user A attempts to trade her individ-    active response slowed down during
    A great deal of thought was given       ual record locks for an "exclusive"      the execution of very complex SQL
to the design of a locking subsystem        lock at the table level, she must wait   statements involving joins of several
which would prevent interference            until user B ends his transaction and    tables. This performance degrada-
among concurrent users of System            releases his "intention" lock on the     tion must be traded off against
R. The original design involved the         table.                                   the advantages of normalization
concept of "predicate locks," in                                                     [23, 30], in which large database
which the lockable unit was a data-         4. Phase Two: Evaluation                 tables are broken into smaller parts
base property such as "employees                 The evaluation phase of the Sys-    to avoid redundancy, and then
whose location is Evanston." Note           tem R project lasted approximately       joined back together by the view
that, in this scheme, a lock might be       2'/2 years and consisted of two parts:   mechanism or user applications.
held on the predicate LOC = 'EVANS-         (l) experiments performed on the
TON', even if no employees currently        system at the San Jose Research Lab-          The SQL Language
satisfy that predicate. By comparing        oratory, and (2) actual use of the            The SQL user interface of System
the predicates being processed by           system at a number of internal IBM        R was generally felt to be successful
different users, the locking subsys-        sites and at three selected customer      in achieving its goals of simplicity,
tem could prevent interference. The         sites. At all user sites, System R was    power, and data independence. The
"predicate lock" design was ulti-           installed on an experimental basis        language was simple enough in its
mately abandoned because: (1) de-           for study purposes only, and not as       basic structure so that users without
termining whether two predicates are        a supported commercial product.           prior experience were able to learn a
mutually satisfiable is difficult and       The first installations of System R       usable subset on their first sitting. At
time-consuming; (2) two predicates           took place in June 1977.                 the same time, when taken as a
may appear to conflict when, in fact,                                                 whole, the language provided the
the semantics of the data prevent any           General User Comments                 query power of the first-order pred-
conflict, as in "PRODUCT      =    AIR-         In general, user response to Sys-     icate calculus combined with opera-
CRAFT" and "MANUFACTURER ---~               tem R has been enthusiastic. The          tors for grouping, arithmetic, and
ACME STATIONERY CO."; a n d (3) w e         system was mostly used in applica-        built-in functions such as SUM and
desired to contain the locking sub-         tions for which ease of installation,     AVERAGE.
system entirely within the RSS, and         a high-level user language, and an             Users consistently praised the
therefore to make it independent of         ability to rapidly reconfigure the         uniformity of the SQL syntax across
any understanding of the predicates         database were important require-           the environments of application pro-
being processed by various users.           ments. Several user sites reported         grams, ad hoc query, and data defi-
The original predicate locking              that they were able to install the         nition (i.e., definition of views).
scheme is described in [29].                system, design and load a database,        Users who were formerly required to
    The locking scheme eventually           and put into use some application          learn inconsistent languages for these
chosen for System R is described in         programs within a matter of days.          purposes found it easier to deal with
[34]. This scheme involves a hierar-        User sites also reported that it was       the single syntax (e.g., when debug-
chy of locks, with several different        possible to tune the system perform-       ging an application program by
sizes of lockable units, ranging from       ance after data was loaded by creat-       querying the database to observe its
individual records to several tables.       ing and dropping indexes without         " effects). The single syntax also en-
The locking subsystem is transparent        impacting end users or application         hanced communication among dif-
to end users, but acquires locks on         programs. Even changes in the data-        ferent functional organizations (e.g.,
physical objects in the database as         base tables could be made transpar-        between database administrators and
they are processed by each user.            ent to users if the tables were read-      application programmers).
When a user accumulates many                only, and also in some cases for up-           While developing applications
small locks, they may be "traded"           dated tables.                              using SQL, our experimental users
for a larger lockable unit (e.g., locks         Users found the performance            made a number of suggestions for
on many records in a table might be         characteristics and resource con-          extensions and improvements to the
traded for a lock on the table). When       sumption of System R to be gener-          language, most of which were imple-
locks are acquired on small objects,        ally satisfactory for their experimen-     mented during the course of the proj-

639                                         Communications                            October 1981
                                            of                                        Volume 24
                                          zycnzj.com/http://www.zycnzj.com/
                                            the ACM                                   N u m b e r 10
zycnzj.com/ www.zycnzj.com
COMPUTING                                       The CompilationApproach                     compilation are obvious. All the
                                                The approach of compiling SQL               overhead of parsing, validity check-
PRACTICES                                   statements into machine code was                ing, and access path selection are
                                            one of the most successful parts of             removed from the path of the run-
                                            the System R project. We were able              ning transaction, and the application
 ect. Some of these suggestions are         to generate a machine-language rou-             program interacts with a small, spe-
 summarized below:                          tine to execute any SQL statement of            cially tailored access module rather
     (1) Users requested an easy-to-        arbitrary complexity by selecting               than with a larger and less efficient
use syntax when testing for the exist-      code fragments from a library of ap-            general-purpose interpreter pro-
ence or nonexistence of a data item,        proximately 100 fragments. The re-              gram. Experiments [38] showed that
such as an employee record whose            sult was a beneficial effect on trans-          for a typical short transaction, about
department number matches a given           action programs, ad hoc query, and              80 percent of the instructions were
department record. This facility was        system simplicity.                              executed by the RSS, with the re-
implemented in the form of a special            In an environment of short, re-             maining 20 percent executed by the
"EXISTS" predicate.                         petitive transactions, the benefits of          access module and application pro-
     (2) Users requested a means of
seaching for character strings whose
contents are only partially known,
such as "all license plates beginning
with NVK." This facility was imple-
mented in the form of a special                Example 1 :
"LIKE" predicate which searches for
"patterns" that are allowed to con-              SELECT SUPPNO, PRICE
                                                 FROM    QUOTES
tain "don't care" characters.                    WHERE PARTNO = '010002'
     (3) A requirement arose for an              AND MI NQ < = 1000 AND M A X Q > = 1000;
application program to compute an
                                                                                     CPU time                     Number
SQL statement dynamically, submit                     Operation
                                                                                   (msec on 168)                  of I / O s
the statement to the System R optim-
                                                      Parsing                           13.3                        0
izer for access path selection, and
then execute the statement repeat-                    Access Path                       40.0                        9
edly for different data values without                Selection
reinvoking the optimizer. This facil-                 Code                              10.1                        0
ity was implemented in the form of                    Generation
PREPARE and EXECUTE statements                        Fetch                                 1.5                     0.7
which were made available in the                      answer set
host-language version of SQL.                         (per record)
     (4) In some user applications
the need arose for an operator which
Codd has called an "outer join" [25].
Suppose that two tables (e.g., suP-           Example 2:
PLIERS and PROJECTS) are related by           SELECT    ORDERNO,ORDERS.PARTNO,DESCRIP,DATE,QTY
a common data field (e.g., PARTNO).           FROM      ORDERS,PARTS
In a conventional join of these tables,       WHERE     ORDERS.PARTNO = PARTS.PARTNO
                                              AND       DATE BETWEEN '750000' AND '751231'
supplier records which have no
                                              AND       SUPPNO = '797';
matching project record (and vice
versa) would not appear. In an                                                       CPU time                    Number
                                                      Operation
                                                                                   (msec on 168)                 of I / O s
"outer join" of these tables, supplier
records with no matching project rec-                 Parsing                           20.7                          0
ord would appear together with a                      Access Path                       73.2                          9
"synthetic" project record containing                 Selection
only null values (and similarly for
                                                      Code                              19.3                          0
projects with no matching supplier).                  Generation
An "outer-join" facility for SQL is
                                                      Fetch                                 8.7                     10.7
currently under study.
                                                      answer set
     A more complete discussion of                    (per record)
user experience with SQL and the
resulting language improvements is
presented in [19].                          Fig. 5. Measurements of Cost of Compilation.

64O                                         Communications                                  October 1981
                                          zycnzj.com/http://www.zycnzj.com/
                                            of                                              Volume 24
                                            the ACM                                         N u m b e r l0
zycnzj.com/ www.zycnzj.com
gram. Thus, the user pays only a               (2) If code generation results in     ords by a three-level index. If we
small cost for the power, flexibility,     a routine which runs more efficiently     wish to begin an associative scan
and data independence of the SQL           than an interpreter, the cost of the      through a large table, three I/Os will
language, compared with writing the        code generation step is paid back         typically be required (assuming the
same transaction directly on the           after fetching only a few records. (In    root page is referenced frequently
lower level RSS interface.                 Example 1, if the CPU time per rec-       enough to remain in the system
    In an ad hoc query environment         ord of the compiled module is half        buffers, we need an I / O for the in-
the advantages of compilation are          that of an interpretive system, the       termediate-level index page, the
less obvious since the compilation         cost of generating the access module      "leaf" index page, and the data
must take place on-line and the            is repaid after seven records have        page). If several records are to be
query is executed only once. In this       been fetched.)                            fetched using the index scan, the
environment, the cost of generating                                                  three start-up I/Os are relatively in-
                                                A final advantage of compilation     significant. However, if only one rec-
a machine-language routine for a
                                           is its simplifying effect on the system   ord is to be fetched, other access
given query must be balanced
                                           architecture. With both ad hoc que-       techniques might have provided a
against the increased efficiency of
                                           ries and precanned transactions           quicker path to the stored data.
this routine as compared with a more
                                           being treated in the same way, most            Two common access techniques
conventional query interpreter. Fig-
                                           of the code in the system can be           which were not utilized for user data
ure 5 shows some measurements of
                                           made to serve a dual purpose. This        in System R are hashing and direct
the cost of compiling two typical
                                           ties in very well with our objective of    links (physical pointers from one rec-
SQL statements (details of the exper-
                                           supporting a uniform syntax between        ord to another). Hashing was not
iments are given in [20]). From this
                                           query users and transaction pro-          used because it does not have the
data we may draw the following con-
                                           grams.                                    convenient ordering property of a B-
clusions:
                                                                                     tree index (e.g., a B-tree index on
    (1) The code generation step              Available Access Paths                 SALARY enables a list of employees
adds a small amount of CPU time               As described earlier, the principal    ordered by SALARY to be retrieved
and no I/Os to the overhead of pars-      access path used in System R for           very easily). Direct links, although
ing and access path selection. Parsing    retrieving data associatively by its       they were implemented at the RSS
and access path selection must be         value is the B-tree index. A typical       level, were not used as an access path
done in any query system, including       index is illustrated in Figure 6. If we    for user data by the RDS for a two-
interpretive ones. The additional in-     assume a fan-out of approximately          fold reason. Essential links (links
structions spent on code generation       200 at each level of the tree, we can      whose semantics are not known to
are not likely to be perceptible to an    index up to 40~000 records by a two-       the system but which are connected
end user.                                 level index, and up to 8,000,000 rec-      directly by users) were rejected be-
                                                                                     cause they were inconsistent with the
                                                                                     nonnavigational user interface of a
                                                                                     relational system, since they could
                                                                                     not be used as access paths by an
                                                                                     automatic optimizer. Nonessential
                                            ] Root                                   links (links which connect records to
                                                                                     other records with matching data
                                                                                     values) were not implemented be-
                                                                                     cause of the difficulties in automati-
                                                                                     cally maintaining their connections.
                                                                   Intermediate      When a record is updated, its con-
                                                                   Pages             nections on many links may need to
                                                                                     be updated as well, and this may
                                                                                     involve many "subsidiary queries" to
                                                                                     find the other records which are in-
                                                                      Leaf           volved in these connections. Prob-
                                                                      Pages          lems also arise relating to records
                                                                                     which have no matching partner rec-
                                                                                     ord on the link, and records whose
                                                                                     link-controlling data value is null.
                             [] []       [] []             Data                           In general, our experience
                             []                            Pages                     showed that indexes could be used
                                                                                     very efficiently in queries and trans-
Fig. 6. A B-Tree Index.                                                              actions which access many records,

641                                       Communications                             October 1981
                                     zycnzj.com/http://www.zycnzj.com/
                                         of
                                         the ACM
                                                                                     Volume 24
                                                                                     N u m b e r 10
zycnzj.com/ www.zycnzj.com
COMPUTING                                 was modified in such a way that it        exists on SEQNO. Consider the follow-
                                          could be made to generate the com-        ing SQL query:
PRACTICES                                 plete tree of access paths, without
                                                                                    SELECT * FROM T WH ER E SEQNO IN
                                          pruning, and to estimate the cost of
                                          each path (cost is defined as a           (15, 17, 19, 21);
                                          weighted sum of page fetches and              This query has an answer set of
but that hashing and links would          RSS calls). Mechanisms were also          (at most) four rows, and an obvious
have enhanced the performance of          added to the system whereby it could      method of processing it is to use the
"canned transactions" which access        be forced to execute an SQL state-        SEQNO index repeatedly: first to find
only a few records. As an illustration    ment by a particular access path and      the row with SEQNO 15, then SEQNO
                                                                                                        =

of this problem, consider an inven-       to measure the actual number of           = 17, etc. However, this access path
tory application which has two            page fetches and RSS calls incurred.      would not be chosen by System R,
tables: a PRODUCTStable, and a much       In this way, a comparison can be          because the optimizer is not pres-
larger PARTS table which contains         made between the optimizer's pre-         ently structured to consider multiple
data on the individual parts used for     dicted cost and the actual measured       uses of an index within a single query
each product. Suppose a given trans-      cost for various alternative paths.       block. As we gain more experience
action needs to find the price of the         In [6], an experiment is described    with access path selection, the opti-
heating element in a particular           in which ten SQL statements, includ-      mizer may grow to encompass this
toaster. To execute this transaction,     ing some single-table queries and         and other access paths which have so
System R might require two I/Os to        some joins, are run against a test        far been omitted from consideration.
traverse a two-level index to find the    database. The database is artificially
toaster record, and three more I/Os       generated to conform to the two               Views and Authorization
to traverse another three-level index     basic assumptions of the System R            Users generally found the System
to find the heating element record. If    optimizer: (1) the values in each col-    R mechanisms for defining views
access paths based on hashing and         umn are uniformly distributed from        and controlling authorization to be
direct links were available, it might     some minimum to some maximum              powerful, flexible, and convenient.
be possible to find the toaster record    value; and (2) the distribution of val-   The following features were consid-
in one I / O via hashing, and the heat-   ues of the various columns are inde-      ered to be particularly beneficial:
ing element record in one more I / O      pendent of each other. For each of
                                                                                        (1) The full query power of
via a link. (Additional I/Os would        the ten SQL statements, the ordering
                                                                                    SQL is made available for defining
be required in the event of hash col-     of the predicted costs of the various
                                                                                    new views of data (i.e., any query
lisions or if the toaster parts records   access paths was the same as the
                                                                                    may be defined as a view). This
occupied more than one page.) Thus,       ordering of the actual measured costs
                                                                                    makes it possible to define a rich
for this very simple transaction hash-    (in a few cases the optimizer pre-
                                                                                    variety of views, containing joins,
ing and links might reduce the num-       dicted two paths to have the same
                                                                                    subqueries, aggregation, etc., without
ber of I/Os from five to three, or        cost when their actual costs were un-
                                                                                    having to learn a separate "data def-
even two. For transactions which re-      equal but adjacent in the ordering).
                                                                                    inition language." However, the view
trieve a large set of records, the ad-        Although the optimizer was able
                                                                                    mechanism is not completely trans-
ditional I/Os caused by indexes com-      to correctly order the access paths in
                                                                                    parent to the end user, because of the
pared to hashing and links are less       the experiment we have just de-
                                                                                    restrictions described earlier (e.g.,
important.                                scribed, the magnitudes of the pre-
                                                                                    views involving joins of more than
                                          dicted costs differed from the mea-
     The Optimizer                                                                  one table are not updateable).
                                          sured costs in several cases. These
                                                                                        (2) The authorization subsys-
    A series of experiments was con-      discrepancies were due to a variety
                                                                                    tem allows each installation of Sys-
ducted at the San Jose IBM Research       of causes, such as the optimizer's in-
                                                                                    tem R to choose a "fully centralized
Laboratory to evaluate the success of     ability to predict how much data
                                                                                    policy" in which all tables are cre-
the System R optimizer in choosing        would remain in the system buffers
                                                                                    ated and privileges controlled by a
among the available access paths for      during sorting.
                                                                                    central administrator; or a "fully de-
typical SQL statements. The results           The above experiment does not
                                                                                    centralized policy" in which each
of these experiments are reported in      address the issue of whether or not a
                                                                                    user may create tables and control
[6]. For the purpose of the experi-       very good access path for a given
                                                                                    access to them; or some intermediate
ments, the optimizer was modified in      SQL statement might be overlooked
                                                                                    policy.
order to observe its behavior. Or-        because it is not part of the opti-
dinarily, the optimizer searches          mizer's repertoire. One such example          During the two-year evaluation
through a tree of path choices, com-      is known. Suppose that the database       of System R, the following sugges-
puting estimated costs and pruning        contains a table T in which each row      tions were made by users for im-
the tree until it arrives at a single     has a unique value for the field          provement of the view and authori-
preferred access path. The optimizer      SEQNO, and suppose that an index          zation subsystems:

642                                       Communications                            October 1981
                                          zycnzj.com/http://www.zycnzj.com/
                                          of
                                          the A C M
                                                                                    Volume 24
                                                                                    N u m b e r 10
(1) The authorization subsys-
                                    zycnzj.com/ impact is due pri- a Level-1
                                       This performance
                                                        www.zycnzj.com                          transaction may not give
tem could be augmented by the con-        marily to the following factors:          consistent values. A Level-l trans-
cept of a "group" of users. Each              (1) Since each updated page is        action does not attempt to acquire
group would have a "group admin-          written out to a new location on disk,    any locks on records while reading.
istrator" who controls enrollment of      data tends to move about. This limits         Level 2: A transaction running
new members in the group. Privi-          the ability of the system to cluster      at Level 2 is protected against read-
leges could then be granted to the        related pages in secondary storage to     ing uncommitted data. However,
group as a whole rather than to each      minimize disk arm movement for se-        successive reads at Level 2 may still
member of the group individually.         quential applications.                    yield inconsistent values if a second
     (2) A new command could be               (2) Since each page can poten-        transaction updates a given record
added to the SQL language to              tially have both an "old" and "new"       and then terminates between the first
change the ownership of a table from      version, a directory must be main-        and second reads by the Level-2
one user to another. This suggestion      tained to locate both versions of each    transaction. A Level-2 transaction
is more difficult to implement than       page. For large databases, the direc-     locks each record before reading it to
it seems at first glance, because the     tory may be large enough to require       make sure it is committed at the time
 owner's name is part of the fully        a paging mechanism of its own.            of the read, but then releases the lock
 qualified name of a table (i.e., two          (3) The periodic checkpoints         immediately after reading.
 tables owned by Smith and Jones          which exchange the "old" and "new"            Level 3: A transaction running
 could be named SMITH.PARTS and           page pointers generate I / O activity     at Level 3 is guaranteed that succes-
 JONES.PARTS). References to the          and consume a certain amount of           sive reads of the same record will
 table SMITH.PARTS might exist in         CPU time.                                 yield the same value. This guarantee
 many places, such as view definitions                                              is enforced by acquiring a lock on
                                              A possible alternative technique
 and compiled programs. Finding                                                     each record read by a Level-3 trans-
                                          for recovering from system failures
 and changing all these references                                                  action and holding the lock until the
                                          would dispense with the concept of
 would be difficult (perhaps impossi-                                               end of the transaction. (The lock ac-
                                          shadow pages, and simply keep a log
 ble, as in the case of users' source                                               quired by a Level-3 reader is a
                                          of all database updates. This design
 programs which are not stored under                                                "share" lock which permits other
                                          would require that all updates be
 System R control).                                                                 users to read but not update the
                                          written out to the log before the up-
     (3) Occasionally it is necessary                                               locked record.)
                                          dated page migrates to disk from the
 to reload an existing table in the
                                          system buffers. Mechanisms could be
 database (e.g., to change its physical                                                 It was our intention that Isolation
                                          developed to minimize I/Os by re-
 clustering properties). In System R                                                Level 1 provide a means for very
                                          taining updated pages in the buffers
 this is accomplished by dropping the                                               quick scans through the database
                                          until several pages are written out at
 old table definition, creating a new                                               when approximate values were ac-
                                          once, sharing an I / O to the log.
 table with the same definition, and                                                ceptable, since Level-1 readers ac-
 reloading the data into the new table.       The Locking Subsystem                 quire no locks and should never need
 Unfortunately, views and authoriza-                                                to wait for other users. In practice,
                                               The locking subsystem of System
 tions defined on the table are lost                                                however, it was found that Level-1
                                          R provides each user with a choice
 from the system when the old defi-                                                 readers did have to wait under cer-
                                          of three levels of isolation from other
 nition is dropped, and therefore they                                              tain circumstances while the phys-
                                          users. In order to explain the three
 both must be redefined on the new                                                  ical consistency of the data was
                                          levels, we define "uncommitted
 table. It has been suggested that                                                  suspended (e.g., while indexes
                                          data" as those records which have
 views and authorizations defined on                                                or pointers were being adjusted).
                                          been updated by a transaction that is
 a dropped table might optionally be                                                Therefore, the potential of Level 1
                                          still in progress (and therefore still
 held "in abeyance" pending reacti-                                                 for increasing system concurrency
                                          subject to being backed out). Under
 vation of the table.                                                               was not fully realized.
                                          no circumstances can a transaction,
                                                                                        It was our expectation that a
      The Recovery Subsystem              at any isolation level, perform up-
                                                                                    tradeoff would exist between Isola-
                                          dates on the uncommitted data of
    The combined "shadow page"                                                      tion Levels 2 and 3 in which Level 2
                                          another transaction, since this might
and log mechanism used in System                                                    would be "cheaper" and Level 3
                                          lead to lost updates in the event of
R proved to be quite successful in                                                  "safer." In practice, however, it was
                                          transaction backout.
safeguarding the database against                                                   observed that Level 3 actually in-
                                               The three levels of isolation in
media, system, and transaction fail-                                                volved less CPU overhead than
                                          System R are defined as follows:
ures. The part of the recovery sub-                                                 Level 2, since it was simpler to ac-
system which was observed to have              Level 1: A transaction running       quire locks and keep them than to
the greatest impact on system per-         at Level 1 may read (but not update)     acquire locks and immediately
formance was the keeping of a              uncommitted data. Therefore, suc-        release them. It is true that Isolation
shadow page for each updated page.         cessive reads of the same record by       Level 2 permits a greater degree of

643                                        Communications                           October 1981
                                           of                                       Volume 24
                                           the ACM                                  Number 10
                                          zycnzj.com/http://www.zycnzj.com/
zycnzj.com/ processes will soon re- working
                                                  www.zycnzj.com
COMPUTING                            dispatchable
                                          quest the same lock and become en-
                                                                                              set reduced if several users
                                                                                    executing the same "canned trans-
PRACTICES                                 queued behind the sleeping process.       action" could share a common access
                                          This phenomenon is called a "con-         module. This would require the Sys-
                                          voy."                                     tem R code generator to produce
                                              In the original System R design,      reentrant code. Approximately half
access to the database by concurrent      convoys are stable because of the         the space occupied by the multiple
readers and updaters than does Level      protocol for releasing locks. When a      copies of the access module could be
3. However, this increase in concur-      process P releases a lock, the locking    saved by this method, since the other
rency was not observed to have an         subsystem grants the lock to the first    half consists of working storage
important effect in most practical ap-    waiting process in the queue (thereby     which must be duplicated for each
plications.                               making it unavailable to be reac-         user.
    As a result of the observations       quired by P). After a short time, P           (2) When the recovery subsys-
described above, most System R            once again requests the lock, and is      tem attempts to take an automatic
users ran their queries and applica-      forced to go to the end of the convoy.    checkpoint, it inhibits the processing
tion programs at Level 3, which was       If the mean time between requests         of new RSS commands until all users
the system default.                       for the high-traffic lock is 1,000 in-    have completed their current RSS
                                          structions, each process may execute      command; then the checkpoint is
      The Convoy Phenomenon               only 1,000 instructions before it         taken and all users are allowed to
                                          drops to the end of the convoy. Since     proceed. However, certain RSS com-
    Experiments with the locking
                                          more than 1,000 instructions are typ-     mands potentially involve long op-
subsystem of System R identified a
                                          ically used to dispatch a process, the    erations, such as sorting a file. If
problem which came to be known as
                                          system goes into a "thrashing" con-       these "long" RSS operations were
the "convoy phenomenon" [9].
                                          dition in which most of the cycles are    made interruptible, it would avoid
There are certain high-traffic locks
                                          spent on dispatching overhead.            any delay in performing checkpoints.
in System R which every process
                                              The solution to the convoy prob-          (3) The System R design o f au-
requests frequently and holds for a
                                          lem involved a change to the lock         tomatically maintaining a system
short time. Examples of these are the
                                          release protocol of System R. After       catalog as part of the on-line data-
locks which control access to the
                                          the change, when a process P releases     base was very well liked by users,
buffer pool and the system log. In a
                                          a lock, all processes which are en-       since it permitted them to access the
"convoy" condition, interaction be-
                                          queued for the lock are made dis-         information in the catalog with ex-
tween a high-traffic lock and the op-
                                          patchable, but the lock is not granted    actly the same query language they
erating system dispatcher tends to
                                          to any particular process. Therefore,     use for accessing other data.
serialize all processes in the system,
                                          the lock may be regranted to process
allowing each process to acquire the
                                          P if it makes a subsequent request.       5. Conclusions
lock only once each time it is dis-
                                          Process P may acquire and release
patched.                                                                                We feel that our experience with
                                          the lock many times before its time
    In the VM/370 operating system,                                                 System R has clearly demonstrated
                                          slice is exhausted. It is highly prob-
each process in the multiprogram-                                                   the feasibility of applying a rela-
                                          able that process P will not be hold-
ming set receives a series of small                                                 tional database system to a real pro-
                                          ing the lock when it goes into a long
"quanta" of CPU time. Each quan-                                                    duction environment in which many
                                          wait. Therefore, if a convoy should
tum terminates after a preset amount                                                concurrent users are performing a
                                          ever form, it will most likely evapo-
of CPU time, or when the process                                                    mixture of ad hoc queries and repet-
                                          rate as soon as all the members of
goes into page, 1/O, or lock wait. At                                               itive transactions. We believe that
                                          the convoy have been dispatched.
the end of the series of quanta, the                                                the high-level user interface made
process drops out of the multipro-           Additional Observations                possible by the relational data model
gramming set and must undergo a                                                     can have a dramatic positive effect
                                             Other observations were made
longer "time slice wait" before it                                                  on user productivity in developing
                                          during the evaluation of System R
once again becomes dispatchable.                                                    new applications, and on the data
                                          and are listed below:
Most quanta end when a process                                                      independence of queries and pro-
waits for a page, an I / O operation,         (1) When running in a "canned         grams. System R has also demon-
or a low-traffic lock. The System R       transaction" environment, it would        strated the ability to support a highly
design ensures that no process will       be helpful for the system to include      dynamic database environment in
ever hold a high-traffic lock during      a data communications front end to        which application requirements are
any of these types of wait. There is      handle terminal interactions, priority    rapidly changing.
a slight probability, however, that a     scheduling, and logging and restart           In particular, System R has illus-
process might go into a long "time        at the message level. This facility was   trated the feasibility of compiling a
slice wait" while it is holding a high-   not included in the System R design.      very high-level data sublanguage,
traffic lock. In this event, all other    Also, space would be saved and the        SQL, into machine-level code. The

644                                       Communications                            October 1981
                                          of                                        Volume 24
                                      zycnzj.com/http://www.zycnzj.com/
                                          the ACM                                   N u m b e r 10
zycnzj.com/ www.zycnzj.com
result of this compilation technique    from E. F. Codd, whose landmark                   12. Boyce, R.F., and Chamberlin, D.D. Us-
is that most of the overhead cost for   paper [22] introduced the relational              ing a structured English query language as a
                                                                                          data definition facility. IBM Res. Rep.
implementing the high-level lan-        model of data. The manager of the                 RJl318, San Jose, Calif., Dec. 1973.
guage is pushed into a "precompila-     project through most of its existence
                                                                                          13. Boyce, R.F., Chamberlin, D.D., King,
tion" step, and performance for         was W. F. King.                                   W.F., and Hammer, M.M. Specifying queries
canned transactions is comparable to       In addition to the authors of this             as relational expressions: The SQUARE data
that of a much lower level system.      paper, the following people were as-              sublanguage. Comm. A C M 18, I l (Nov.
                                                                                          1975), 621-628.
The compilation approach has also       sociated with System R and made
proved to be applicable to the ad hoc   important contributions to its devel-             14. Chamberlin, D.D., and Boyce, R.F. SE-
                                                                                          QUEL: A structured English query language.
query environment, with the result      opment:                                           Proc. ACM-SIGMOD Workshop on Data
that a unified mechanism can be                                                           Description, Access, and Control, Ann Ar-
                                         M. Adiba                M. Mresse                bor, Mich., May 1974, pp. 249-264.
used to support both queries and
transactions.                           R.F. Boyce              J.F. Nilsson              15. Chamberlin, D.D., Gray, J.N., and
    The evaluation of System R has        A. Chan             R.L. Obermarck              Traiger, I.L. Views, authorization, and lock-
                                        D.M. Choy             D. Stott Parker             ing in a relational database system. Proc.
led to a number of suggested im-                                                          1975 Nat. Comptr. Conf., Anaheim, Calif.,
provements. Some of these improve-      K. Eswaran                D. Portal               pp. 425-430.
ments have already been imple-           R. Fagin             N. Ramsperger
                                         P. Fehder               P. Reisner               16. Chamberlin, D.D., et al. SEQUEL 2: A
mented and others are still under                                                         unified approach to data definition, manipu-
study. Two major foci of our contin-    T. Haerder              P.R. Roever               lation, and control. I B M J. Res. and Develop.
uing research program at the San         R.H. Katz              R. Selinger               20, 6 (Nov. 1976), 560-575 (also see errata in
                                          W. Kim                                          Jan. 1977 issue).
Jose laboratory are adaptation of                               H.R. Strong
System R to a distributed database       H. Korth                P. Tiberio               17. Chamberlin, D.D. Relational database
                                        P. McJones               V. Watson                management systems. Comptng. Surv. 8, I
environment, and extension of our                                                         (March 1976), 43-66.
optimizer algorithms to encompass a     D. McLeod               R. Williams
                                                                                          18. Chamberlin, D.D., et al. Data base sys-
broader set of access paths.            References                                        tem authorization. In Foundations o f Secure
    Sometimes questions are asked                                                         Computation, R. Demillo, D. Dobkin, A.
                                        1. Adiba, M.E., and Lindsay, B.G. Data-           Jones, and R. Lipton, Eds., Academic Press,
about how the performance of a re-                                                        New York, 1978, pp. 39-56.
                                        base snapshots. IBM Res. Rep. RJ2772, San
lational database system might com-     Jose, Calif., March 1980.
pare to that of a "navigational" sys-                                                     19. Chamberlin, D.D. A summary of user
                                        2. Astrahan, M.M., and Chamberlin, D.D.           experience with the SQL data sublanguage.
tem in which a programmer carefully     Implementation of a structured English            Proc. Internat. Conf. Data Bases, Aberdeen,
hand-codes an application to take       query language. Comm. A C M 18, 10 (Oct.          Scotland, July 1980, pp. 181-203 (also IBM
                                         1975), 580-588.                                  Res. Rep. RJ2767, San Jose, Calif., April
advantage of explicit access paths.
                                        3. Astrahan, M.M., and Lorie, R.A. SE-            1980).
Our experiments with the System R       QUEL-XRM: A Relational System. Proc.
optimizer and compiler suggest that     ACM Pacific Regional Conf., San Francisco,        20. Chamberlin, D.D., et al. Support for re-
                                        Calif., April 1975, p. 34.                        petitive transactions and ad-hoc queries in
the relational system will probably                                                       System R. A C M Trans. Database Syst. 6, 1
approach but not quite equal the        4. Astrahan, M.M., et al. System R: A rela-       (March 1981), 70-94.
                                        tional approach to database management.
performance of the navigational sys-    A C M Trans. Database Syst.1, 2 (June 1976)       21. Chamberlin, D.D., Gilbert, A.M., and
tem for a particular, highly tuned      97-137.                                           Yost, R.A. A history of System R and SQL/
application, but that the relational    5. Astrahan, M.M., et al. System R: A rela-       data system (presented at the Internat. Conf.
                                        tional data base management system. 1EEE          Very Large Data Bases, Cannes, France,
system is more likely to be able to                                                       Sept. 1981).
                                        Comptr. 12, 5 (May 1979), 43-48.
adapt to a broad spectrum of unan-
                                        6. Astrahan, M.M., Kim, W., and Schkol-           22. Codd, E.F. A relational model of data
ticipated applications with adequate    nick, M. Evaluation of the System R access        for large shared data banks. Comm. A C M
performance. We believe that the        path selection mechanism. Proc. IFIP Con-         13, 6 (June 1970), 377-387.
benefits of relational systems in the   gress, Melbourne, Australia, Sept. 1980, pp.
                                        487-491.                                          23. Codd, E.F. Further normalization of the
areas of user productivity, data in-                                                      data base relational model. In Courant Com-
                                        7. Blasgen, M.W., Eswaran, K.P. Storage           puter Science Symposia, Vol. 6: Data Base
dependence, and adaptability to         and access in relational databases. I B M Syst.   Systems, Prentice-Hall, Englewood Cliffs,
changing circumstances will take on     J. 16, 4 (1977), 363-377.                         N.J., 1971, pp. 33-64.
increasing importance in the years      8. Blasgen, M.W., Casey, R.G., and Es-
                                        waran, K.P. An encoding method for multi-         24. Codd, E.F. Recent investigations in rela-
ahead.                                                                                    tional data base systems. Proc. IFIP Con-
                                        field sorting and indexing. Comm. A C M 20,
                                        11 (Nov. 1977), 874-878.                          gress, Stockholm, Sweden, Aug. 1974.
A ckno wledgments                       9. Blasgen, M., Gray, J., Mitoma, M., and         25. Codd, E.F. Extending the database rela-
                                        Price, T. The convoy phenomenon. Operat-          tional model to capture more meaning. A C M
   From the beginning, System R         ing Syst. Rev. 13, 2 (April 1979), 20-25.         Trans. Database Syst. 4, 4 (Dec. 1979), 397-
was a group effort. Credit for any      10. Blasgen, M.W., et al. System R: An ar-        434.
success of the project properly be-     chitectural overview. I B M Syst. J. 20, 1
                                        (Feb. 1981), 41-62.                               26. Comer, D. The ubiquitous B-Tree.
longs to the team as a whole rather                                                       Comptng. Surv. 11, 2 (June 1979), 121-137.
than to specific individuals.           11. Bjorner, D., Codd, E.F., Deckert, K.L.,
                                        and Traiger, I.L. The Gamma Zero N-ary            27. Date, C.J. An Introduction to Database
   The inspiration for constructing     relational data base interface. IBM Res. Rep.     Systems. 2nd Ed., Addison-Wesley, New
a relational system came primarily      RJ 1200, San Jose, Calif., April 1973.            York, 1977.

645                                   Communications                                      October 1981
                                      of                                                  Volume 24
                              zycnzj.com/http://www.zycnzj.com/
                                      the ACM                                             Number 10
zycnzj.com/ www.zycnzj.com
28. Eswaran, K.P., and Chamberlin, D.D.           35. Gray, J.N. Notes on database operating       43. Lorie, R.A., and Nilsson, J.F. An access
Functional specifications of a subsystem for      systems. In Operating Systems: An Advanced       specification language for a relational data
database integrity. Proc. Conf. Very Large        Course, Goos and Hartmanis, Eds., Springer-      base system. I B M J. Res. and Develop. 23, 3
Data Bases, Framingham, Mass., Sept. 1975,        Verlag, New York, 1978, pp. 393-481 (also        (May 1979), 286-298.
pp. 48-68.                                        IBM Res. Rep. RJ2188, San Jose, Calif.).
                                                                                                   44. Reisner, P., Boyce, R.F., and Chamber-
29. Eswaran, K.P., Gray, J.N., Lorie, R.A.,                                                        lin, D.D. Human factors evaluation of two
                                                  36. Gray, J.N., et al. The recovery manager
and Traiger, I.L. On the notions of consis-                                                        data base query languages: SQUARE and
                                                  of a data management system. IBM Res.
tency and predicate locks in a database sys-                                                       SEQUEL. Proc. AFIPS Nat. Comptr. Conf.,
                                                  Rep. RJ2623, San Jose, Calif., June 1979.        Anaheim, Calif., May 1975, pp. 447-452.
tem. Comm. A C M 19, 11 (Nov. 1976), 624-
633.                                              37. Griffiths, P.P., and Wade, B.W. An au-       45. Reisner, P. Use of psychological experi-
                                                  thorization mechanism for a relational data-     mentation as an aid to development of a
30. Fagin, R. Multivalued dependencies and
                                                  base system. A C M Trans. Database Syst. 1, 3    query language. I E E E Trans. Software Eng.
a new normal form for relational databases.
                                                  (Sept. 1976), 242-255.                           SE-3, 3 (May 1977), 218-229.
A C M Trans. Database Syst. 2, 3 (Sept. 1977),
262-278.                                                                                           46. Schkolnick, M., and Tiberio, P. Consid-
                                                  38. Katz, R.H., and Selinger, R.D. Internal      erations in developing a design tool for a
31. Fagin, R. On an authorization mecha-          comm., IBM Res. Lab., San Jose, Calif.,          relational DBMS. Proc. IEEE COMPSAC
nism. A C M Trans. Database Syst. 3, 3 (Sept.     Sept. 1978.                                      79, Nov. 1979, pp. 228-235.
1978), 310-319.
                                                  39. Kwan, S.C., and Strong, H.R. Index           47. Selinger, P.G., et al. Access path selec-
32. Gray, J.N., and Watson, V. A shared                                                            tion in a relational database management
                                                  path length evaluation for the research stor-
segment and inter-process communication                                                            system. Proc. ACM SIGMOD Conf., Boston,
                                                  age system of System R. IBM Res. Rep.
facility for VM/370. IBM Res. Rep. RJ1579,                                                         Mass., June 1979, pp. 23-34.
                                                  RJ2736, San Jose, Calif., Jan. 1980.
San Jose, Calif., Feb. 1975.                                                                       48. Stonebraker, M. Implementation of in-
33. Gray, J.N., Lorie, R.A., and Putzolu,         40. Lorie, R.A. X R M - - A n extended (N-ary)   tegrity constraints and views by query modi-
G.F. Granularity of locks in a large shared       relational memory. IBM Tech. Rep. G320-          fication. Tech. Memo ERL-M514, College of
database. Proc. Conf. Very Large Data             2096, Cambridge Scientific Ctr., Cambridge,      Eng., Univ. of Calif. at Berkeley, March
Bases, Framingham, Mass., Sept. 1975, pp.         Mass., Jan. 1974.                                1975.
428-451.                                                                                           49. Strong, H.R., Traiger, I.L., and Mar-
                                                  41. Lorie, R.A. Physical integrity in a large
34. Gray, J.N., Lorie, R.A., Putzolu, G.R.,                                                        kowsky, G. Slide Search. IBM Res. Rep.
                                                  segmented database. A C M Trans. Database
and Traiger, I.L. Granularity of locks and        Syst. 2, 1 (March 1977), 91-104.                 RJ2274, San Jose, Calif., June 1978.
degrees of consistency in a shared data base.                                                      50. Traiger, I.L., Gray J.N., Galtieri, C.A.,
Proc. IFIP Working Conf. Modelling of             42. Lorie, R.A., and Wade, B.W. The com-         and Lindsay, B.G. Transactions and consis-
Database Management Systems, Freuden-             pilation of a high level data language. IBM      tency in distributed database systems. IBM
stadt, Germany, Jan. 1976, pp. 695-723 (also      Res. Rep. RJ2598, San Jose, Calif., Aug.         Res. Rep. RJ2555, San Jose, Calif., June
IBM Res. Rep. RJ1654, San Jose, Calif.).          1979.                                            1979.




646                                               Communications                   October 1981
                                                  of zycnzj.com/http://www.zycnzj.com/ 24
                                                                                   Volume
                                                  the ACM                          Number 10

A history and evaluation of system r

  • 1.
    zycnzj.com/ www.zycnzj.com COMPUTING PRACTICES A History and Evaluation of System R Donald D. Chamberlin Thomas G. Price Morton M. Astrahan Franco Putzolu Michael W. Blasgen Patricia Griffiths Selinger James N. Gray Mario Schkolnick W. Frank King Donald R. Slutz Bruce G. Lindsay Irving L. Traiger Raymond Lorie Bradford W. Wade James W. Mehl Robert A. Yost IBM Research Laboratory San Jose, California 1. Introduction Throughout the history of infor- mation storage in computers, one of SUMMARY: System R, an experimental database system, the most readily observable trends has been the focus on data indepen- was constructed to demonstrate that the usability advantages dence. C.J. Date [27] defined data of the relational data model can be realized in a system with independence as "immunity of ap- the complete function and high performance required for plications to change in storage struc- everyday production use. This paper describes the three ture and access strategy." Modern principal phases of the System R project and discusses some database systems offer data indepen- of the lessons learned from System R about the design of dence by providing a high-level user interface through which users deal relational systems and database systems in general. with the information content of their data, rather than the various bits, pointers, arrays, lists, etc. which are representation for the information; sented by connections between the used to represent that information. indeed, the representation of a given relevant part and supplier records. In The system assumes responsibility fact may change over time without such a system, a user frames a ques- for choosing an appropriate internal users being aware of the change. tion, such as "What is the lowest Permission to copy without fee all or part of The relational data model was price for bolts?", by writing a pro- this material is granted provided that the cop- proposed by E.F. Codd [22] in 1970 gram which "navigates" through the ies are not made or distributed for direct as the next logical step in the trend maze of connections until it arrives commercial advantage, the ACM copyright notice and the title o f the publication and its toward data independence. Codd ob- at the answer to the question. The date appear, and notice is given that copying served that conventional database user of a "navigational" system has is by permission of the Association for Com- systems store information in two the burden (or opportunity) to spec- puting Machinery. To copy otherwise, or to republish, requires a fee and/or specific per- ways: (1) by the contents of records ify exactly how the query is to be mission. stored in the database, and (2) by the processed; the user's algorithm is Key words and phrases: database manage- ways in which these records are con- then embodied in a program which ment systems, relational model, compilation, locking, recovery, access path selection, au- nected together. Different systems is dependent on the data structure thorization use various names for the connec- that existed at the time the program CR Categories: 3.50, 3.70, 3.72, 4.33, 4.6 tions among records, such as links, was written. Authors' address: D. D. Chamberlin et al., IBM Research Laboratory, 5600 Cottle Road, sets, chains, parents, etc. For exam- Relational database systems, as San Jose, California 95193. ple, in Figure l(a), the fact that sup- proposed by Codd, have two impor- © 1981 ACM 0001-0782/81/1000-0632 75¢. plier Acme supplies bolts is repre- tant properties: (1) all information is 632 Communications October 1981 zycnzj.com/http://www.zycnzj.com/ of Volume 24 the ACM Number 10
  • 2.
    zycnzj.com/ www.zycnzj.com represented bydata values, never by any sort of "connections" which are visible to the user; (2) the system supports a very high-level language FF in which users can frame requests for data without specifying algorithms for processing the requests. The re- lational representation of the data in Figure l(a) is shown in Figure l(b). Information about parts is kept in a PARTS relation in which each record has a "key" (unique identifier) called PARTNO. Information about suppliers SUPPLIERS is kept in a SUPPLIERSrelation keyed by SUPPNO. The information which was formerly represented by connec- tions between records is now con- tained in a third relation, PRICES, in which parts and suppliers are repre- pcF sented by their respective keys. The Fig. l(a). A "Navigational" Database. question "What is the lowest price for bolts?" can be framed in a high- level language like SQL [16] as fol- lows: required for everyday production nisms to protect the integrity of the SELECT MIN(PRICE) FROM PRICES use. database in a concurrent-update en- W H E R E PARTNO IN The key goals established for Sys- vironment. (SELECT P A R T N O tem R were: (5) To provide a means of re- FROM PARTS. W H E R E NAME = 'BOLT'); covering the contents of the database (1) To provide a high-level, to a consistent state after a failure of A relational system can maintain nonnavigational user interface for hardware or software. whatever pointers, indices, or other maximum user productivity and data (6) To provide a flexible mech- access aids it finds appropriate for independence. anism whereby different views of processing user requests, but the (2) To support different types stored data can be defined and var- user's request is not framed in terms of database use including pro- ious users can be authorized to query of these access aids and is therefore grammed transactions, ad hoc que- and update these views. not dependent on them. Therefore, ries, and report generation. (7) To support all of the above the system may change its data rep- (3) To support a rapidly chang- functions with a level of performance resentation and access aids periodi- ing database environment, in which comparable to existing lower-func- cally to adapt to changing require- tables, indexes, views, transactions, tion database systems. ments without disturbing users' ex- and other objects could easily be isting applications. added to and removed from the data- Throughout the System R project, Since Codd's original paper, the base without stopping the system. there has been a strong commitment advantages of the relational data (4) To support a population of to carry the system through to an model in terms of user productivity many concurrent users, with mecha- operationally complete prototype and data independence have become widely recognized. However, as in the early days of high-level program- ming languages, questions are some- PARTS SUPPLIERS PRICES times raised about whether or not an automatic system can choose as ef- PARTNO NAME SUPPNO NAME PARTNO SUPPNO PRICE ficient an algorithm for processing a P107 Bolt $51 Acme P107 $51 .59 complex query as a trained program- P113 Nut $57 Ajax P107 $57 .65 mer would. System R is an experi- P125 Screw $63 Amco P113 $51 .25 mental system constructed at the San P132 Gear P113 $63 .21 P125 $63 .15 Jose IBM Research Laboratory to P132 $57 5.25 demonstrate that a relational data- P132 $63 10.00 base system can incorporate the high performance and complete function Fig. l(b). A Relational Database. 633 Communications October 1981 of Volume 24 the ACM N u m b e r 10 zycnzj.com/http://www.zycnzj.com/
  • 3.
    zycnzj.com/ www.zycnzj.com tional access method called XRM, by the facilities ofXRM. XRM stores COMPUTING which had been developed by R. relations in the form of "tuples," PRACTICES Lorie at IBM's Cambridge Scientific each of which has a unique 32-bit Center [40]. '(XRM was influenced, "tuple identifier" (TID). Since a TID to some extent, by the " G a m m a contains a page number, it is possi- which could be installed and evalu- Zero" interface defined by E.F. ble, given a TID, to fetch the asso- ated in actual user sites. Codd and others at San Jose [11].) ciated tuple in one page reference. The history of System R can be Since XRM is a single-user access However, rather than actual data divided into three phases. "Phase method without locking or recovery values, the tuple contains pointers to Zero" of the project, which occurred capabilities, issues relating to con- the "domains" where the actual data during 1974 and-most of 1975, in- currency and recovery were excluded is stored, as shown in Figure 2. Op- volved the development of the SQL from consideration in Phase Zero. tionally, each domain may have an user interface [14] and a quick im- An interpreter program was writ- "inversion," which associates do- plementation of a subset of SQL for ten in P L / I to execute statements main values (e.g., "Programmer") one user at a time. The Phase Zero in the high-level SQL (formerly with the TIDs of tuples in which the prototype, described in [2], provided SEQUEL) language [14, 16] on top values appear. Using the inversions, valuable insight in several areas, but of XRM. The implemented subset XRM makes it easy to find a list of its code was eventually abandoned. of the SQL language included que- TIDs of tuples which contain a given "Phase One" of the project, which ries and updates of the database, as value. For example, in Figure 2, if took place throughout most of 1976 well as the dynamic creation of inversions exist on both the JOB and and 1977, involved the design and new database relations. The Phase LOCATION domains, XRM provides construction of the full-function, Zero implementation supported the commands to create a list of TIDs of multiuser version of System R. An "subquery" construct of SQL, but employees who are programmers, initial system architecture was pre- not its "join" construct. In effect, this and another list of TIDs of employ- sented in [4] and subsequent updates meant that a query could search ees who work in Evanston. If the to the design were described in [10]. through several relations in comput- SQL query calls for programmers "Phase Two" was the evaluation of ing its result, but the final result who work in Evanston, these TID System R in actual use. This oc- would be taken from a single rela- lists can be intersected to obtain the curred during 1978 and 1979 and tion. list of TIDs of tuples which satisfy involved experiments at the San Jose The Phase Zero implementation the query, before any tuples are ac- Research Laboratory and several was primarily intended for use as a tually fetched. other user sites. The results of some standalone query interface by end The most challenging task in con- of these experiments and user expe- users at interactive terminals. At the structing the Phase Zero prototype riences are described in [19-21]. At time, little emphasis was placed on was the design of optimizer algo- each user site, System R was installed issues of interfacing to host-language rithms for efficient execution of SQL for experimental purposes only, and programs (although Phase Zero statements on top of XRM. The de- not as a supported commercial prod- could be called from a P L / I sign of the Phase Zero optimizer is uct.1 program). However, considerable given in [2]. The objective of the This paper will describe the de- thought was given to the human fac- optimizer was to minimize the num- cisions which were made and the tors aspects of the SQL language, ber of tuples fetched from the data- lessons learned during each of the and an experimental study was con- base in processing a query. There- three phases of the System R project. ducted on the learnability and usa- fore, the optimizer made extensive bility of SQL [44]. use of inversions and often manipu- 2. Phase Zero: An Initial Proto- One of the basic design decisions lated TID lists before beginning to type in the Phase Zero prototype was that fetch tuples. Since the TID lists were Phase Zero of the System R proj- the system catalog, i.e., the descrip- potentially large, they were stored as ect involved the quick implementa- tion of the content and structure of temporary objects in the database tion of a subset of system functions. the database, should be stored as a during query processing. From the beginning, it was our inten- set of regular relations in the data- The results of the Phase Zero tion to learn what we could from this base itself. This approach permits the implementation were mixed. One initial prototype, and then scrap the system to keep the catalog up to date strongly felt conclusion was that it is Phase Zero code before construction automatically as changes are made a very good idea, in a project the size of the more complete version of Sys- to the database, and also makes the of System R, to plan to throw away tem R. We decided to use the rela- catalog information available to the the initial implementation. On the 1The System R research prototype later system optimzer for use in access positive side, Phase Zero demon- evolved into SQL/Data System, a relational path selection. strated the usability of the SQL lan- database management product offered by guage, the feasibility of creating new IBM in the DOS/VSE operating system en- The structure of the Phase Zero vironment. interpreter was strongly influenced tables and inversions "on the fly" 634 Communications October 1981 Volume 24 ofzycnzj.com/http://www.zycnzj.com/ the ACM Number 10
  • 4.
    zycnzj.com/ www.zycnzj.com and relyingon an automatic opti- mizer for access path selection, and Domain#1 : Names Domain# 3: Locations the convenience of storing the system catalog in the database itself. At the same time, Phase Zero taught us a number of valuable lessons which JohnSmith Evanston greatly influenced the design of our later implementation. Some of these lessons are summarized below. (1) The optimizer should take into account not just the cost of fetching tuples, but the costs of cre- ating and manipulating TID lists, then fetching tuples, then fetching the data pointed to by the tuples. T'D1 /I When these "hidden costs" are taken into account, it will be seen that the ~ 2 : Jobs manipulation of TID lists is quite expensive, especially if the TID lists are managed in the database rather than in main storage. Programmer (2) Rather than "number of tu- pies fetched," a better measure of cost would have been "number of I/Os." This improved cost measure would have revealed the great im- Fig. 2. X R M Storage Structure. portance of clustering together re- lated tuples on physical pages so that several related tuples could be fetched by a single I/O. Also, an subsequent implementation, both and access path selection functions I/O measure would have revealed a "joins" and "subqueries" were sup- were isolated in the RDS. Construc- serious drawback of XRM: Storing ported. tion of the RSS was underway in the domains separately from the tu- (5) The Phase Zero optimizer 1975 and construction of the RDS pies causes many extra I/Os to be was quite complex and was oriented began in 1976. Unlike XRM, the done in retrieving data values. Be- toward complex queries. In our later RSS was originally designed to sup- cause of this, our later implementa- implementation, greater emphasis port multiple concurrent users. tion stored data values in the actual was placed on relatively simple in- The multiuser prototype of Sys- tuples rather than in separate do- teractions, and care was taken to tem R contained several important mains. (In defense of XRM, it should minimize the "path length" for sim- subsystems which were not present be noted that the separation of data ple SQL statements. in the earlier Phase Zero prototype. values from tuples has some advan- In order to prevent conflicts which tages if data values are relatively 3. Phase One: Construction of a might arise when two concurrent large and if many tuples are proc- Multiuser Prototype users attempt to update the same essed internally compared to the After the completion and evalu- data value, a locking subsystem was number of tuples which are materi- ation of the Phase Zero prototype, provided. The locking subsystem en- alized for output.) work began on the construction of sures that each data value is accessed (3) Because the Phase Zero im- the full-function, multiuser version by only one user at a time, that all plementation was observed to be of System R. Like Phase Zero, Sys- the updates made by a given trans- CPU-bound during the processing of tem R consisted of an access method action become effective simultane- a typical query, it was decided the (called RSS, the Research Storage ously, and that deadlocks between optimizer cost measure should be a System) and an optimizing SQL users are detected and resolved. The weighted sum of CPU time and I / O processor (called RDS, the Rela- security of the system was enhanced count, with weights adjustable ac- tional Data System) which runs on by view and authorization subsys- cording to the system configuration. top of the RSS. Separation of the tems. The view subsystem permits (4) Observation of some of the RSS and RDS provided a beneficial users to define alternative views of applications of Phase Zero con- degree of modularity; e.g., all locking the database (e.g., a view of the em- vinced us of the importance of the and logging functions were isolated ployee file in which salaries are de- "join" formulation of SQL. In our in the RSS, while all authorization leted or aggregated by department). 635 Communications October 1981 of Volume 24 the ACM N u m b e r 10 zycnzj.com/http://www.zycnzj.com/
  • 5.
    zycnzj.com/ www.zycnzj.com COMPUTING SQL statements of arbitrary com- base changes (e.g., an index is plexity could be decomposed into a dropped), all affected access modules PRACTICES relatively small collection of ma- are marked "invalid." The next time chine-language "fragments," and an invalid access module is invoked, The authorization subsystem ensures that an optimizing compiler could it is regenerated from its original that each user has access only to assemble these code fragments from SQL statements, with newly opti- a library to form a specially tailored mized access paths. This process is those views for which he has been specifically authorized by their cre- routine for processing a given SQL completely transparent to the System ators. Finally, a recovery subsystem statement. This technique had a very R user. dramatic effect on our ability to sup- SQL statements submitted to the was provided which allows the data- port application programs for trans- interactive UFI dialog manager are base to be restored to a consistent state in the event of a hardware or action processing. In System R, a processed by the same optimizing software failure. P L / I or Cobol pi'ogram is run compiler as preprocessed SQL state- In order to provide a useful host- through a preprocessor in which its ments. The UFI program passes the SQL statements are examined, opti- ad hoc SQL statement to System R language capability, it was decided mized, and compiled into small, ef- with a special "EXECUTE" call. In re- that System R should support both P L / I and Cobol application pro- ficient machine-language routines sponse to the EXECUTEcall, System R grams as well as a standalone query which are packaged into an "access parses and optimizes the SQL state- interface, and that the system should module" for the application pro- ment and translates it into a ma- run under either the V M / C M S or gram. Then, when the program goes chine-language routine. The routine M V S / T S O operating system envi- into execution, the access module is is indistinguishable from an access ronment. A key goal of the SQL invoked to perform all interactions module and is executed immediately. language was to present the same with the database by means o f calls This process is described in more capabilities, and a consistent syntax, to the RSS. The process of creating detail in [20]. to users of the P L / I and Cobol host and invoking an access module is illustrated in Figures 3 and 4. All the RSS Access Paths languages and to ad hoc query users. The imbedding of SQL into P L / I is overhead of parsing, validity check- Rather than storing data values described in [16]. Installation of a ing, and access path selection is re- in separate "domains" in the manner multiuser database system under moved from the path of the execut- o f XRM, the RSS chose to store data V M / C M S required certain modifi- ing program and placed in a separate values in the individual rcords of the cations to the operating system in preprocessor step which need not be database. This resulted in records be- support of communicating virtual repeated. Perhaps even more impor- coming variable in length and machines and writable shared virtual tant is the fact that the running pro- longer, on the average, than the gram interacts only with its small, equivalent XRM records. Also, com- memory. These modifications are de- scribed in [32]. special-purpose access module rather monly used values are represented than with a much larger and less many times rather than only once as The standalone query interface of System R (called UFI, the User- efficient general-purpose SQL inter- in XRM. It was felt, however, that Friendly Interface) is supported by preter. Thus, the power and ease of these disadvantages were more than a dialog manager program, written use of the high-level SQL language offset by the following advantage: in PL/I, which runs on top o f System are combined with the execution- All the data values of a record could R like any other application pro- time efficiency of the much lower be fetched by a single I/O. gram. Therefore, the UFI support level RSS interface. In place of XRM "inversions," program is a cleanly separated com- Since all access path selection de- the RSS provides "indexes," which ponent and can be modified inde- cisions are made during the prepro- are associative access aids imple- pendently of the rest of the system. cessor step in System R, there is the mented in the form of B-Trees [26]. possibility that subsequent changes Each table in the database may have In fact, several users improved on our UFI by writing interactive dialog in the database may invalidate the anywhere from zero indexes up to an managers of their own. decisions which are embodied in an index on each column (it is also pos- access module. For example, an in- sible to create an index on a combi- The Compilation Approach dex selected by the optimizer may nation of columns). Indexes make it Perhaps the most important de- later be dropped from the database. possible to scan the table in order by cision in the design of the RDS was Therefore, System R records with the indexed values, or to directly ac- inspired by R. Lorie's observation, in each access module a list of its "de- cess the records which match a par- early 1976, that it is possible to com- pendencies" on database objects ticular value. Indexes are maintained pile very high-level SQL statements such as tables and indexes. The de- automatically by the RSS in the into compact, efficient routines in pendency list is stored in the form of event of updates to the database. System/370 machine language [42]. a regular relation in the system cat- The RSS also implements Lorie was able to demonstrate that alog. When the structure of the data- "links," which are pointers stored 636 Communications October 1981 zycnzj.com/http://www.zycnzj.com/ of Volume 24 the ACM N u m b e r l0
  • 6.
    zycnzj.com/ www.zycnzj.com temporary list in the database. In P L / I Source Program System R, the RDS makes extensive I use o f index and relation scans and f sorting. The RDS also utilizes links I for internal purposes but not as an SELECT NAME INTO $)< access path to user data. FROM EMP WHERE EMPNO=$Y The Optimizer I Building on our Phase Zero ex- I perience, we designed the System R I optimizer to minimize the weighted sum of the predicted number of I/Os and RSS calls in processing an SQL statement (the relative weights of SYSTEM R these two terms are adjustable ac- PRECOMPILER cording to system configuration). (XPREP) Rather than manipulating TID lists, the optimizer chooses to scan each table in the SQL query by means of only one index (or, if no suitable index exists, by means of a relation scan). For example, if the query calls for programmers who work in Ev- Modified P L / I Program Access Module anston, the optimizer might choose I to use the job index to find program- I mers and then examine their loca- Machine code ready to run tions; it might use the location index CALL on RSS to find Evanston employees and ex- I amine their jobs; or it might simply I scan the relation and examine the job and location of all employees. Fig. 3. Precompilation Step. The choice would be based on the optimizer's estimate of both the clus- tering and selectivity properties of each index, based on statistics stored User's Object with a record which connect it to in the system catalog. An index is Program considered highly selective if it has a other related records. The connec- tion of records on links is not per- large ratio of distinct key values to call formed automatically by the RSS, total entries. An index is considered but must be done by a higher level to have the clustering property if the Loads, system. key order of the index corresponds Execution-time then calls System Access The access paths made available closely to the ordering of records in Module physical storage. The clustering (XRDI) by the RSS include (1) index scans, which access a table associatively property is important because when l call and scan it in value order using an index; (2) relation scans, which scan over a table as it is laid out in phys- a record is fetched via a clustering index, it is likely that other records with the same key will be found on RSS the same page, thus minimizing the ical storage; (3) link scans, which traverse from one record to another number of page fetches. Because of using links. On any of these types of the importance of clustering, mech- scan, "search arguments" may be anisms were provided for loading specified which limit the records re- data in value order and preserving turned to those satisfying a certain the value ordering when new records predicate. Also, the RSS provides a are inserted into the database. built-in sorting mechanism which The techniques of the System R can take records from any of the scan optimizer for performing joins of two Fig. 4. Execution Step. methods and sort them into some or more tables have their origin in a value order, storing the result in a study conducted by M. Blasgen and 637 Communications October 1981 of Volume 24 zycnzj.com/http://www.zycnzj.com/ the ACM N u m b e r 10
  • 7.
    COMPUTING zycnzj.com/ tree. When an SQL media may fail, the system may fail, an SQL parse www.zycnzj.com operation is to be executed against a or an individual transaction may fail. PRACTICES view, the parse tree which defines Although both the scope of the fail- the operation is merged with the ure and the time to effect recovery parse tree which defines the view, may be different, all three types o f producing a composite parse tree recovery require that an alternate K. Eswaran [7]. Using APL models, which is then sent to the optimizer copy of data be available when the Blasgen and Eswaran studied ten for access path selection. This ap- primary copy is not. methods of joining together tables, proach is similar to the "query mod- When a media failure occurs, based on the use of indexes, sorting, ification" technique proposed by database information on disk is lost. physical pointers, and TID lists. The Stonebraker [48]. The algorithms de- When this happens, an image dump number of disk accesses required to veloped for merging parse trees were of the database plus a log o f " b e f o r e " perform a join was predicted on the sufficiently general so that nearly and "after" changes provide the al- basis of various assumptions for the any SQL statement could be exe- ternate copy which makes recovery ten join methods. Two join methods cuted against any view definition, possible. System R's use of "dual were identified such that one or the with the restriction that a view can logs" even permits recovery from other was optimal or nearly optimal be updated only if it is derived from media failures on the log itself. To under most circumstances. The two a single table in the database. The recover from a media failure, the methods are as follows: reason for this restriction is that some database is restored using the latest updates to views which are derived image dump and the recovery pro- Join Method 1: Scan over the from more than one table are not cess reapplies all database changes qualifying rows of table A. For each meaningful (an example of such an as specified on the log for completed row, fetch the matching rows of table update is given in [24]). transactions. B (usually, but not always, an index The authorization subsystem of When a system failure occurs, the on table B is used). System R is based on privileges information in main memory is lost. Join Method 2: (Often used which are controlled by the SQL Thus, enough information must al- when no suitable index exists.) Sort statements GRANT and REVOKE.Each ways be on disk to make recovery the qualifying rows of tables A and user of System R may optionally be possible. For recovery from system B in order by their respective join given a privilege called RESOURCE failures, System R uses the change fields. Then scan over the sorted lists which enables h i m / h e r to create new log mentioned above plus something and merge them by matching values. tables in the database. When a user called "shadow pages." As each page When selecting an access path for creates a table, he/she receives all in the database is updated, the page a join of several tables, the System R privileges to access, update, and de- is written out in a new place on disk, optimizer considers the problem to stroy that table. The creator of a and the original page is retained. A be a sequence of binary joins. It then table can then grant these privileges directory of the "old" and "new" performs a tree search in which each to other individual users, and subse- locations of each page is maintained. level of the tree consists of one of the quently can revoke these grants if Periodically during normal opera- binary joins. The choices to be made desired. Each granted privilege may tion, a "checkpoint" occurs in which at each level of the tree include which optionally carry with it the "GRANT all updates are forced out to disk, the join method to use and which index, option," which enables a recipient to "old" pages are discarded, and the if any, to select for scanning. Com- grant the privilege to yet other users. "new" pages become "old." In the parisons are applied at each level of A REVOKE destroys the whole chain event of a system crash, the "new" the tree to prune away paths which of granted privileges derived from pages on disk may be in an incon- achieve the same results as other, less the original grant. The authorization sistent state because some updated costly paths. When all paths have subsystem is described in detail in pages may still be in the system been examined, the optimizer selects [37] and discussed further in [31]. buffers and not yet reflected on disk. the one o f minimum predicted cost. To bring the database back to a con- The System R optimizer algorithms The Recovery Subsystem sistent state, the system reverts to the are described more fully in [47]. The key objective of the recovery "old" pages, and then uses the log to subsystem is provision of a means redo all committed transactions and Views and Authorization whereby the database may be re- to undo all updates made by incom- The major objectives of the view covered to a consistent state in the plete transactions. This aspect of the and authorization subsystems o f Sys- event of a failure. A consistent state System R recovery subsystem is de- tem R were power and flexibility. is defined as one in which the data- scribed in more detail in [36]. We wanted to allow any SQL query base does not reflect any updates When a transaction failure o c - to be used as the definition of a view. made by transactions which did not curs, all database changes which This was accomplished by storing complete successfully. There are have been made by the failing trans- each view definition in the form of three basic types of failure: the disk action must be undone. To accom- 638 Communications October 1981 of Volume 24 the ACM N u m b e r 10 zycnzj.com/http://www.zycnzj.com/
  • 8.
    zycnzj.com/ www.zycnzj.com plish this,System R simply processes "intention" locks are simultaneously tal applications, although no speci- the change log backwards removing acquired on the larger objects which fic performance comparisons were all changes made by the transaction. contain them. For example, user A drawn. In general, the experimental Unlike media and system recovery and user B may both be updating databases used with System R were which both require that System R be employee records. Each user holds smaller than one 3330 disk pack (200 reinitialized, transaction recovery an "intention" lock on the employee Megabytes) and were typically ac- takes place on-line. table, and "exclusive" locks on the cessed by fewer than ten concurrent particular records being updated. If users. As might be expected, inter- The Locking Subsystem user A attempts to trade her individ- active response slowed down during A great deal of thought was given ual record locks for an "exclusive" the execution of very complex SQL to the design of a locking subsystem lock at the table level, she must wait statements involving joins of several which would prevent interference until user B ends his transaction and tables. This performance degrada- among concurrent users of System releases his "intention" lock on the tion must be traded off against R. The original design involved the table. the advantages of normalization concept of "predicate locks," in [23, 30], in which large database which the lockable unit was a data- 4. Phase Two: Evaluation tables are broken into smaller parts base property such as "employees The evaluation phase of the Sys- to avoid redundancy, and then whose location is Evanston." Note tem R project lasted approximately joined back together by the view that, in this scheme, a lock might be 2'/2 years and consisted of two parts: mechanism or user applications. held on the predicate LOC = 'EVANS- (l) experiments performed on the TON', even if no employees currently system at the San Jose Research Lab- The SQL Language satisfy that predicate. By comparing oratory, and (2) actual use of the The SQL user interface of System the predicates being processed by system at a number of internal IBM R was generally felt to be successful different users, the locking subsys- sites and at three selected customer in achieving its goals of simplicity, tem could prevent interference. The sites. At all user sites, System R was power, and data independence. The "predicate lock" design was ulti- installed on an experimental basis language was simple enough in its mately abandoned because: (1) de- for study purposes only, and not as basic structure so that users without termining whether two predicates are a supported commercial product. prior experience were able to learn a mutually satisfiable is difficult and The first installations of System R usable subset on their first sitting. At time-consuming; (2) two predicates took place in June 1977. the same time, when taken as a may appear to conflict when, in fact, whole, the language provided the the semantics of the data prevent any General User Comments query power of the first-order pred- conflict, as in "PRODUCT = AIR- In general, user response to Sys- icate calculus combined with opera- CRAFT" and "MANUFACTURER ---~ tem R has been enthusiastic. The tors for grouping, arithmetic, and ACME STATIONERY CO."; a n d (3) w e system was mostly used in applica- built-in functions such as SUM and desired to contain the locking sub- tions for which ease of installation, AVERAGE. system entirely within the RSS, and a high-level user language, and an Users consistently praised the therefore to make it independent of ability to rapidly reconfigure the uniformity of the SQL syntax across any understanding of the predicates database were important require- the environments of application pro- being processed by various users. ments. Several user sites reported grams, ad hoc query, and data defi- The original predicate locking that they were able to install the nition (i.e., definition of views). scheme is described in [29]. system, design and load a database, Users who were formerly required to The locking scheme eventually and put into use some application learn inconsistent languages for these chosen for System R is described in programs within a matter of days. purposes found it easier to deal with [34]. This scheme involves a hierar- User sites also reported that it was the single syntax (e.g., when debug- chy of locks, with several different possible to tune the system perform- ging an application program by sizes of lockable units, ranging from ance after data was loaded by creat- querying the database to observe its individual records to several tables. ing and dropping indexes without " effects). The single syntax also en- The locking subsystem is transparent impacting end users or application hanced communication among dif- to end users, but acquires locks on programs. Even changes in the data- ferent functional organizations (e.g., physical objects in the database as base tables could be made transpar- between database administrators and they are processed by each user. ent to users if the tables were read- application programmers). When a user accumulates many only, and also in some cases for up- While developing applications small locks, they may be "traded" dated tables. using SQL, our experimental users for a larger lockable unit (e.g., locks Users found the performance made a number of suggestions for on many records in a table might be characteristics and resource con- extensions and improvements to the traded for a lock on the table). When sumption of System R to be gener- language, most of which were imple- locks are acquired on small objects, ally satisfactory for their experimen- mented during the course of the proj- 639 Communications October 1981 of Volume 24 zycnzj.com/http://www.zycnzj.com/ the ACM N u m b e r 10
  • 9.
    zycnzj.com/ www.zycnzj.com COMPUTING The CompilationApproach compilation are obvious. All the The approach of compiling SQL overhead of parsing, validity check- PRACTICES statements into machine code was ing, and access path selection are one of the most successful parts of removed from the path of the run- the System R project. We were able ning transaction, and the application ect. Some of these suggestions are to generate a machine-language rou- program interacts with a small, spe- summarized below: tine to execute any SQL statement of cially tailored access module rather (1) Users requested an easy-to- arbitrary complexity by selecting than with a larger and less efficient use syntax when testing for the exist- code fragments from a library of ap- general-purpose interpreter pro- ence or nonexistence of a data item, proximately 100 fragments. The re- gram. Experiments [38] showed that such as an employee record whose sult was a beneficial effect on trans- for a typical short transaction, about department number matches a given action programs, ad hoc query, and 80 percent of the instructions were department record. This facility was system simplicity. executed by the RSS, with the re- implemented in the form of a special In an environment of short, re- maining 20 percent executed by the "EXISTS" predicate. petitive transactions, the benefits of access module and application pro- (2) Users requested a means of seaching for character strings whose contents are only partially known, such as "all license plates beginning with NVK." This facility was imple- mented in the form of a special Example 1 : "LIKE" predicate which searches for "patterns" that are allowed to con- SELECT SUPPNO, PRICE FROM QUOTES tain "don't care" characters. WHERE PARTNO = '010002' (3) A requirement arose for an AND MI NQ < = 1000 AND M A X Q > = 1000; application program to compute an CPU time Number SQL statement dynamically, submit Operation (msec on 168) of I / O s the statement to the System R optim- Parsing 13.3 0 izer for access path selection, and then execute the statement repeat- Access Path 40.0 9 edly for different data values without Selection reinvoking the optimizer. This facil- Code 10.1 0 ity was implemented in the form of Generation PREPARE and EXECUTE statements Fetch 1.5 0.7 which were made available in the answer set host-language version of SQL. (per record) (4) In some user applications the need arose for an operator which Codd has called an "outer join" [25]. Suppose that two tables (e.g., suP- Example 2: PLIERS and PROJECTS) are related by SELECT ORDERNO,ORDERS.PARTNO,DESCRIP,DATE,QTY a common data field (e.g., PARTNO). FROM ORDERS,PARTS In a conventional join of these tables, WHERE ORDERS.PARTNO = PARTS.PARTNO AND DATE BETWEEN '750000' AND '751231' supplier records which have no AND SUPPNO = '797'; matching project record (and vice versa) would not appear. In an CPU time Number Operation (msec on 168) of I / O s "outer join" of these tables, supplier records with no matching project rec- Parsing 20.7 0 ord would appear together with a Access Path 73.2 9 "synthetic" project record containing Selection only null values (and similarly for Code 19.3 0 projects with no matching supplier). Generation An "outer-join" facility for SQL is Fetch 8.7 10.7 currently under study. answer set A more complete discussion of (per record) user experience with SQL and the resulting language improvements is presented in [19]. Fig. 5. Measurements of Cost of Compilation. 64O Communications October 1981 zycnzj.com/http://www.zycnzj.com/ of Volume 24 the ACM N u m b e r l0
  • 10.
    zycnzj.com/ www.zycnzj.com gram. Thus,the user pays only a (2) If code generation results in ords by a three-level index. If we small cost for the power, flexibility, a routine which runs more efficiently wish to begin an associative scan and data independence of the SQL than an interpreter, the cost of the through a large table, three I/Os will language, compared with writing the code generation step is paid back typically be required (assuming the same transaction directly on the after fetching only a few records. (In root page is referenced frequently lower level RSS interface. Example 1, if the CPU time per rec- enough to remain in the system In an ad hoc query environment ord of the compiled module is half buffers, we need an I / O for the in- the advantages of compilation are that of an interpretive system, the termediate-level index page, the less obvious since the compilation cost of generating the access module "leaf" index page, and the data must take place on-line and the is repaid after seven records have page). If several records are to be query is executed only once. In this been fetched.) fetched using the index scan, the environment, the cost of generating three start-up I/Os are relatively in- A final advantage of compilation significant. However, if only one rec- a machine-language routine for a is its simplifying effect on the system ord is to be fetched, other access given query must be balanced architecture. With both ad hoc que- techniques might have provided a against the increased efficiency of ries and precanned transactions quicker path to the stored data. this routine as compared with a more being treated in the same way, most Two common access techniques conventional query interpreter. Fig- of the code in the system can be which were not utilized for user data ure 5 shows some measurements of made to serve a dual purpose. This in System R are hashing and direct the cost of compiling two typical ties in very well with our objective of links (physical pointers from one rec- SQL statements (details of the exper- supporting a uniform syntax between ord to another). Hashing was not iments are given in [20]). From this query users and transaction pro- used because it does not have the data we may draw the following con- grams. convenient ordering property of a B- clusions: tree index (e.g., a B-tree index on (1) The code generation step Available Access Paths SALARY enables a list of employees adds a small amount of CPU time As described earlier, the principal ordered by SALARY to be retrieved and no I/Os to the overhead of pars- access path used in System R for very easily). Direct links, although ing and access path selection. Parsing retrieving data associatively by its they were implemented at the RSS and access path selection must be value is the B-tree index. A typical level, were not used as an access path done in any query system, including index is illustrated in Figure 6. If we for user data by the RDS for a two- interpretive ones. The additional in- assume a fan-out of approximately fold reason. Essential links (links structions spent on code generation 200 at each level of the tree, we can whose semantics are not known to are not likely to be perceptible to an index up to 40~000 records by a two- the system but which are connected end user. level index, and up to 8,000,000 rec- directly by users) were rejected be- cause they were inconsistent with the nonnavigational user interface of a relational system, since they could not be used as access paths by an automatic optimizer. Nonessential ] Root links (links which connect records to other records with matching data values) were not implemented be- cause of the difficulties in automati- cally maintaining their connections. Intermediate When a record is updated, its con- Pages nections on many links may need to be updated as well, and this may involve many "subsidiary queries" to find the other records which are in- Leaf volved in these connections. Prob- Pages lems also arise relating to records which have no matching partner rec- ord on the link, and records whose link-controlling data value is null. [] [] [] [] Data In general, our experience [] Pages showed that indexes could be used very efficiently in queries and trans- Fig. 6. A B-Tree Index. actions which access many records, 641 Communications October 1981 zycnzj.com/http://www.zycnzj.com/ of the ACM Volume 24 N u m b e r 10
  • 11.
    zycnzj.com/ www.zycnzj.com COMPUTING was modified in such a way that it exists on SEQNO. Consider the follow- could be made to generate the com- ing SQL query: PRACTICES plete tree of access paths, without SELECT * FROM T WH ER E SEQNO IN pruning, and to estimate the cost of each path (cost is defined as a (15, 17, 19, 21); weighted sum of page fetches and This query has an answer set of but that hashing and links would RSS calls). Mechanisms were also (at most) four rows, and an obvious have enhanced the performance of added to the system whereby it could method of processing it is to use the "canned transactions" which access be forced to execute an SQL state- SEQNO index repeatedly: first to find only a few records. As an illustration ment by a particular access path and the row with SEQNO 15, then SEQNO = of this problem, consider an inven- to measure the actual number of = 17, etc. However, this access path tory application which has two page fetches and RSS calls incurred. would not be chosen by System R, tables: a PRODUCTStable, and a much In this way, a comparison can be because the optimizer is not pres- larger PARTS table which contains made between the optimizer's pre- ently structured to consider multiple data on the individual parts used for dicted cost and the actual measured uses of an index within a single query each product. Suppose a given trans- cost for various alternative paths. block. As we gain more experience action needs to find the price of the In [6], an experiment is described with access path selection, the opti- heating element in a particular in which ten SQL statements, includ- mizer may grow to encompass this toaster. To execute this transaction, ing some single-table queries and and other access paths which have so System R might require two I/Os to some joins, are run against a test far been omitted from consideration. traverse a two-level index to find the database. The database is artificially toaster record, and three more I/Os generated to conform to the two Views and Authorization to traverse another three-level index basic assumptions of the System R Users generally found the System to find the heating element record. If optimizer: (1) the values in each col- R mechanisms for defining views access paths based on hashing and umn are uniformly distributed from and controlling authorization to be direct links were available, it might some minimum to some maximum powerful, flexible, and convenient. be possible to find the toaster record value; and (2) the distribution of val- The following features were consid- in one I / O via hashing, and the heat- ues of the various columns are inde- ered to be particularly beneficial: ing element record in one more I / O pendent of each other. For each of (1) The full query power of via a link. (Additional I/Os would the ten SQL statements, the ordering SQL is made available for defining be required in the event of hash col- of the predicted costs of the various new views of data (i.e., any query lisions or if the toaster parts records access paths was the same as the may be defined as a view). This occupied more than one page.) Thus, ordering of the actual measured costs makes it possible to define a rich for this very simple transaction hash- (in a few cases the optimizer pre- variety of views, containing joins, ing and links might reduce the num- dicted two paths to have the same subqueries, aggregation, etc., without ber of I/Os from five to three, or cost when their actual costs were un- having to learn a separate "data def- even two. For transactions which re- equal but adjacent in the ordering). inition language." However, the view trieve a large set of records, the ad- Although the optimizer was able mechanism is not completely trans- ditional I/Os caused by indexes com- to correctly order the access paths in parent to the end user, because of the pared to hashing and links are less the experiment we have just de- restrictions described earlier (e.g., important. scribed, the magnitudes of the pre- views involving joins of more than dicted costs differed from the mea- The Optimizer one table are not updateable). sured costs in several cases. These (2) The authorization subsys- A series of experiments was con- discrepancies were due to a variety tem allows each installation of Sys- ducted at the San Jose IBM Research of causes, such as the optimizer's in- tem R to choose a "fully centralized Laboratory to evaluate the success of ability to predict how much data policy" in which all tables are cre- the System R optimizer in choosing would remain in the system buffers ated and privileges controlled by a among the available access paths for during sorting. central administrator; or a "fully de- typical SQL statements. The results The above experiment does not centralized policy" in which each of these experiments are reported in address the issue of whether or not a user may create tables and control [6]. For the purpose of the experi- very good access path for a given access to them; or some intermediate ments, the optimizer was modified in SQL statement might be overlooked policy. order to observe its behavior. Or- because it is not part of the opti- dinarily, the optimizer searches mizer's repertoire. One such example During the two-year evaluation through a tree of path choices, com- is known. Suppose that the database of System R, the following sugges- puting estimated costs and pruning contains a table T in which each row tions were made by users for im- the tree until it arrives at a single has a unique value for the field provement of the view and authori- preferred access path. The optimizer SEQNO, and suppose that an index zation subsystems: 642 Communications October 1981 zycnzj.com/http://www.zycnzj.com/ of the A C M Volume 24 N u m b e r 10
  • 12.
    (1) The authorizationsubsys- zycnzj.com/ impact is due pri- a Level-1 This performance www.zycnzj.com transaction may not give tem could be augmented by the con- marily to the following factors: consistent values. A Level-l trans- cept of a "group" of users. Each (1) Since each updated page is action does not attempt to acquire group would have a "group admin- written out to a new location on disk, any locks on records while reading. istrator" who controls enrollment of data tends to move about. This limits Level 2: A transaction running new members in the group. Privi- the ability of the system to cluster at Level 2 is protected against read- leges could then be granted to the related pages in secondary storage to ing uncommitted data. However, group as a whole rather than to each minimize disk arm movement for se- successive reads at Level 2 may still member of the group individually. quential applications. yield inconsistent values if a second (2) A new command could be (2) Since each page can poten- transaction updates a given record added to the SQL language to tially have both an "old" and "new" and then terminates between the first change the ownership of a table from version, a directory must be main- and second reads by the Level-2 one user to another. This suggestion tained to locate both versions of each transaction. A Level-2 transaction is more difficult to implement than page. For large databases, the direc- locks each record before reading it to it seems at first glance, because the tory may be large enough to require make sure it is committed at the time owner's name is part of the fully a paging mechanism of its own. of the read, but then releases the lock qualified name of a table (i.e., two (3) The periodic checkpoints immediately after reading. tables owned by Smith and Jones which exchange the "old" and "new" Level 3: A transaction running could be named SMITH.PARTS and page pointers generate I / O activity at Level 3 is guaranteed that succes- JONES.PARTS). References to the and consume a certain amount of sive reads of the same record will table SMITH.PARTS might exist in CPU time. yield the same value. This guarantee many places, such as view definitions is enforced by acquiring a lock on A possible alternative technique and compiled programs. Finding each record read by a Level-3 trans- for recovering from system failures and changing all these references action and holding the lock until the would dispense with the concept of would be difficult (perhaps impossi- end of the transaction. (The lock ac- shadow pages, and simply keep a log ble, as in the case of users' source quired by a Level-3 reader is a of all database updates. This design programs which are not stored under "share" lock which permits other would require that all updates be System R control). users to read but not update the written out to the log before the up- (3) Occasionally it is necessary locked record.) dated page migrates to disk from the to reload an existing table in the system buffers. Mechanisms could be database (e.g., to change its physical It was our intention that Isolation developed to minimize I/Os by re- clustering properties). In System R Level 1 provide a means for very taining updated pages in the buffers this is accomplished by dropping the quick scans through the database until several pages are written out at old table definition, creating a new when approximate values were ac- once, sharing an I / O to the log. table with the same definition, and ceptable, since Level-1 readers ac- reloading the data into the new table. The Locking Subsystem quire no locks and should never need Unfortunately, views and authoriza- to wait for other users. In practice, The locking subsystem of System tions defined on the table are lost however, it was found that Level-1 R provides each user with a choice from the system when the old defi- readers did have to wait under cer- of three levels of isolation from other nition is dropped, and therefore they tain circumstances while the phys- users. In order to explain the three both must be redefined on the new ical consistency of the data was levels, we define "uncommitted table. It has been suggested that suspended (e.g., while indexes data" as those records which have views and authorizations defined on or pointers were being adjusted). been updated by a transaction that is a dropped table might optionally be Therefore, the potential of Level 1 still in progress (and therefore still held "in abeyance" pending reacti- for increasing system concurrency subject to being backed out). Under vation of the table. was not fully realized. no circumstances can a transaction, It was our expectation that a The Recovery Subsystem at any isolation level, perform up- tradeoff would exist between Isola- dates on the uncommitted data of The combined "shadow page" tion Levels 2 and 3 in which Level 2 another transaction, since this might and log mechanism used in System would be "cheaper" and Level 3 lead to lost updates in the event of R proved to be quite successful in "safer." In practice, however, it was transaction backout. safeguarding the database against observed that Level 3 actually in- The three levels of isolation in media, system, and transaction fail- volved less CPU overhead than System R are defined as follows: ures. The part of the recovery sub- Level 2, since it was simpler to ac- system which was observed to have Level 1: A transaction running quire locks and keep them than to the greatest impact on system per- at Level 1 may read (but not update) acquire locks and immediately formance was the keeping of a uncommitted data. Therefore, suc- release them. It is true that Isolation shadow page for each updated page. cessive reads of the same record by Level 2 permits a greater degree of 643 Communications October 1981 of Volume 24 the ACM Number 10 zycnzj.com/http://www.zycnzj.com/
  • 13.
    zycnzj.com/ processes willsoon re- working www.zycnzj.com COMPUTING dispatchable quest the same lock and become en- set reduced if several users executing the same "canned trans- PRACTICES queued behind the sleeping process. action" could share a common access This phenomenon is called a "con- module. This would require the Sys- voy." tem R code generator to produce In the original System R design, reentrant code. Approximately half access to the database by concurrent convoys are stable because of the the space occupied by the multiple readers and updaters than does Level protocol for releasing locks. When a copies of the access module could be 3. However, this increase in concur- process P releases a lock, the locking saved by this method, since the other rency was not observed to have an subsystem grants the lock to the first half consists of working storage important effect in most practical ap- waiting process in the queue (thereby which must be duplicated for each plications. making it unavailable to be reac- user. As a result of the observations quired by P). After a short time, P (2) When the recovery subsys- described above, most System R once again requests the lock, and is tem attempts to take an automatic users ran their queries and applica- forced to go to the end of the convoy. checkpoint, it inhibits the processing tion programs at Level 3, which was If the mean time between requests of new RSS commands until all users the system default. for the high-traffic lock is 1,000 in- have completed their current RSS structions, each process may execute command; then the checkpoint is The Convoy Phenomenon only 1,000 instructions before it taken and all users are allowed to drops to the end of the convoy. Since proceed. However, certain RSS com- Experiments with the locking more than 1,000 instructions are typ- mands potentially involve long op- subsystem of System R identified a ically used to dispatch a process, the erations, such as sorting a file. If problem which came to be known as system goes into a "thrashing" con- these "long" RSS operations were the "convoy phenomenon" [9]. dition in which most of the cycles are made interruptible, it would avoid There are certain high-traffic locks spent on dispatching overhead. any delay in performing checkpoints. in System R which every process The solution to the convoy prob- (3) The System R design o f au- requests frequently and holds for a lem involved a change to the lock tomatically maintaining a system short time. Examples of these are the release protocol of System R. After catalog as part of the on-line data- locks which control access to the the change, when a process P releases base was very well liked by users, buffer pool and the system log. In a a lock, all processes which are en- since it permitted them to access the "convoy" condition, interaction be- queued for the lock are made dis- information in the catalog with ex- tween a high-traffic lock and the op- patchable, but the lock is not granted actly the same query language they erating system dispatcher tends to to any particular process. Therefore, use for accessing other data. serialize all processes in the system, the lock may be regranted to process allowing each process to acquire the P if it makes a subsequent request. 5. Conclusions lock only once each time it is dis- Process P may acquire and release patched. We feel that our experience with the lock many times before its time In the VM/370 operating system, System R has clearly demonstrated slice is exhausted. It is highly prob- each process in the multiprogram- the feasibility of applying a rela- able that process P will not be hold- ming set receives a series of small tional database system to a real pro- ing the lock when it goes into a long "quanta" of CPU time. Each quan- duction environment in which many wait. Therefore, if a convoy should tum terminates after a preset amount concurrent users are performing a ever form, it will most likely evapo- of CPU time, or when the process mixture of ad hoc queries and repet- rate as soon as all the members of goes into page, 1/O, or lock wait. At itive transactions. We believe that the convoy have been dispatched. the end of the series of quanta, the the high-level user interface made process drops out of the multipro- Additional Observations possible by the relational data model gramming set and must undergo a can have a dramatic positive effect Other observations were made longer "time slice wait" before it on user productivity in developing during the evaluation of System R once again becomes dispatchable. new applications, and on the data and are listed below: Most quanta end when a process independence of queries and pro- waits for a page, an I / O operation, (1) When running in a "canned grams. System R has also demon- or a low-traffic lock. The System R transaction" environment, it would strated the ability to support a highly design ensures that no process will be helpful for the system to include dynamic database environment in ever hold a high-traffic lock during a data communications front end to which application requirements are any of these types of wait. There is handle terminal interactions, priority rapidly changing. a slight probability, however, that a scheduling, and logging and restart In particular, System R has illus- process might go into a long "time at the message level. This facility was trated the feasibility of compiling a slice wait" while it is holding a high- not included in the System R design. very high-level data sublanguage, traffic lock. In this event, all other Also, space would be saved and the SQL, into machine-level code. The 644 Communications October 1981 of Volume 24 zycnzj.com/http://www.zycnzj.com/ the ACM N u m b e r 10
  • 14.
    zycnzj.com/ www.zycnzj.com result ofthis compilation technique from E. F. Codd, whose landmark 12. Boyce, R.F., and Chamberlin, D.D. Us- is that most of the overhead cost for paper [22] introduced the relational ing a structured English query language as a data definition facility. IBM Res. Rep. implementing the high-level lan- model of data. The manager of the RJl318, San Jose, Calif., Dec. 1973. guage is pushed into a "precompila- project through most of its existence 13. Boyce, R.F., Chamberlin, D.D., King, tion" step, and performance for was W. F. King. W.F., and Hammer, M.M. Specifying queries canned transactions is comparable to In addition to the authors of this as relational expressions: The SQUARE data that of a much lower level system. paper, the following people were as- sublanguage. Comm. A C M 18, I l (Nov. 1975), 621-628. The compilation approach has also sociated with System R and made proved to be applicable to the ad hoc important contributions to its devel- 14. Chamberlin, D.D., and Boyce, R.F. SE- QUEL: A structured English query language. query environment, with the result opment: Proc. ACM-SIGMOD Workshop on Data that a unified mechanism can be Description, Access, and Control, Ann Ar- M. Adiba M. Mresse bor, Mich., May 1974, pp. 249-264. used to support both queries and transactions. R.F. Boyce J.F. Nilsson 15. Chamberlin, D.D., Gray, J.N., and The evaluation of System R has A. Chan R.L. Obermarck Traiger, I.L. Views, authorization, and lock- D.M. Choy D. Stott Parker ing in a relational database system. Proc. led to a number of suggested im- 1975 Nat. Comptr. Conf., Anaheim, Calif., provements. Some of these improve- K. Eswaran D. Portal pp. 425-430. ments have already been imple- R. Fagin N. Ramsperger P. Fehder P. Reisner 16. Chamberlin, D.D., et al. SEQUEL 2: A mented and others are still under unified approach to data definition, manipu- study. Two major foci of our contin- T. Haerder P.R. Roever lation, and control. I B M J. Res. and Develop. uing research program at the San R.H. Katz R. Selinger 20, 6 (Nov. 1976), 560-575 (also see errata in W. Kim Jan. 1977 issue). Jose laboratory are adaptation of H.R. Strong System R to a distributed database H. Korth P. Tiberio 17. Chamberlin, D.D. Relational database P. McJones V. Watson management systems. Comptng. Surv. 8, I environment, and extension of our (March 1976), 43-66. optimizer algorithms to encompass a D. McLeod R. Williams 18. Chamberlin, D.D., et al. Data base sys- broader set of access paths. References tem authorization. In Foundations o f Secure Sometimes questions are asked Computation, R. Demillo, D. Dobkin, A. 1. Adiba, M.E., and Lindsay, B.G. Data- Jones, and R. Lipton, Eds., Academic Press, about how the performance of a re- New York, 1978, pp. 39-56. base snapshots. IBM Res. Rep. RJ2772, San lational database system might com- Jose, Calif., March 1980. pare to that of a "navigational" sys- 19. Chamberlin, D.D. A summary of user 2. Astrahan, M.M., and Chamberlin, D.D. experience with the SQL data sublanguage. tem in which a programmer carefully Implementation of a structured English Proc. Internat. Conf. Data Bases, Aberdeen, hand-codes an application to take query language. Comm. A C M 18, 10 (Oct. Scotland, July 1980, pp. 181-203 (also IBM 1975), 580-588. Res. Rep. RJ2767, San Jose, Calif., April advantage of explicit access paths. 3. Astrahan, M.M., and Lorie, R.A. SE- 1980). Our experiments with the System R QUEL-XRM: A Relational System. Proc. optimizer and compiler suggest that ACM Pacific Regional Conf., San Francisco, 20. Chamberlin, D.D., et al. Support for re- Calif., April 1975, p. 34. petitive transactions and ad-hoc queries in the relational system will probably System R. A C M Trans. Database Syst. 6, 1 approach but not quite equal the 4. Astrahan, M.M., et al. System R: A rela- (March 1981), 70-94. tional approach to database management. performance of the navigational sys- A C M Trans. Database Syst.1, 2 (June 1976) 21. Chamberlin, D.D., Gilbert, A.M., and tem for a particular, highly tuned 97-137. Yost, R.A. A history of System R and SQL/ application, but that the relational 5. Astrahan, M.M., et al. System R: A rela- data system (presented at the Internat. Conf. tional data base management system. 1EEE Very Large Data Bases, Cannes, France, system is more likely to be able to Sept. 1981). Comptr. 12, 5 (May 1979), 43-48. adapt to a broad spectrum of unan- 6. Astrahan, M.M., Kim, W., and Schkol- 22. Codd, E.F. A relational model of data ticipated applications with adequate nick, M. Evaluation of the System R access for large shared data banks. Comm. A C M performance. We believe that the path selection mechanism. Proc. IFIP Con- 13, 6 (June 1970), 377-387. benefits of relational systems in the gress, Melbourne, Australia, Sept. 1980, pp. 487-491. 23. Codd, E.F. Further normalization of the areas of user productivity, data in- data base relational model. In Courant Com- 7. Blasgen, M.W., Eswaran, K.P. Storage puter Science Symposia, Vol. 6: Data Base dependence, and adaptability to and access in relational databases. I B M Syst. Systems, Prentice-Hall, Englewood Cliffs, changing circumstances will take on J. 16, 4 (1977), 363-377. N.J., 1971, pp. 33-64. increasing importance in the years 8. Blasgen, M.W., Casey, R.G., and Es- waran, K.P. An encoding method for multi- 24. Codd, E.F. Recent investigations in rela- ahead. tional data base systems. Proc. IFIP Con- field sorting and indexing. Comm. A C M 20, 11 (Nov. 1977), 874-878. gress, Stockholm, Sweden, Aug. 1974. A ckno wledgments 9. Blasgen, M., Gray, J., Mitoma, M., and 25. Codd, E.F. Extending the database rela- Price, T. The convoy phenomenon. Operat- tional model to capture more meaning. A C M From the beginning, System R ing Syst. Rev. 13, 2 (April 1979), 20-25. Trans. Database Syst. 4, 4 (Dec. 1979), 397- was a group effort. Credit for any 10. Blasgen, M.W., et al. System R: An ar- 434. success of the project properly be- chitectural overview. I B M Syst. J. 20, 1 (Feb. 1981), 41-62. 26. Comer, D. The ubiquitous B-Tree. longs to the team as a whole rather Comptng. Surv. 11, 2 (June 1979), 121-137. than to specific individuals. 11. Bjorner, D., Codd, E.F., Deckert, K.L., and Traiger, I.L. The Gamma Zero N-ary 27. Date, C.J. An Introduction to Database The inspiration for constructing relational data base interface. IBM Res. Rep. Systems. 2nd Ed., Addison-Wesley, New a relational system came primarily RJ 1200, San Jose, Calif., April 1973. York, 1977. 645 Communications October 1981 of Volume 24 zycnzj.com/http://www.zycnzj.com/ the ACM Number 10
  • 15.
    zycnzj.com/ www.zycnzj.com 28. Eswaran,K.P., and Chamberlin, D.D. 35. Gray, J.N. Notes on database operating 43. Lorie, R.A., and Nilsson, J.F. An access Functional specifications of a subsystem for systems. In Operating Systems: An Advanced specification language for a relational data database integrity. Proc. Conf. Very Large Course, Goos and Hartmanis, Eds., Springer- base system. I B M J. Res. and Develop. 23, 3 Data Bases, Framingham, Mass., Sept. 1975, Verlag, New York, 1978, pp. 393-481 (also (May 1979), 286-298. pp. 48-68. IBM Res. Rep. RJ2188, San Jose, Calif.). 44. Reisner, P., Boyce, R.F., and Chamber- 29. Eswaran, K.P., Gray, J.N., Lorie, R.A., lin, D.D. Human factors evaluation of two 36. Gray, J.N., et al. The recovery manager and Traiger, I.L. On the notions of consis- data base query languages: SQUARE and of a data management system. IBM Res. tency and predicate locks in a database sys- SEQUEL. Proc. AFIPS Nat. Comptr. Conf., Rep. RJ2623, San Jose, Calif., June 1979. Anaheim, Calif., May 1975, pp. 447-452. tem. Comm. A C M 19, 11 (Nov. 1976), 624- 633. 37. Griffiths, P.P., and Wade, B.W. An au- 45. Reisner, P. Use of psychological experi- thorization mechanism for a relational data- mentation as an aid to development of a 30. Fagin, R. Multivalued dependencies and base system. A C M Trans. Database Syst. 1, 3 query language. I E E E Trans. Software Eng. a new normal form for relational databases. (Sept. 1976), 242-255. SE-3, 3 (May 1977), 218-229. A C M Trans. Database Syst. 2, 3 (Sept. 1977), 262-278. 46. Schkolnick, M., and Tiberio, P. Consid- 38. Katz, R.H., and Selinger, R.D. Internal erations in developing a design tool for a 31. Fagin, R. On an authorization mecha- comm., IBM Res. Lab., San Jose, Calif., relational DBMS. Proc. IEEE COMPSAC nism. A C M Trans. Database Syst. 3, 3 (Sept. Sept. 1978. 79, Nov. 1979, pp. 228-235. 1978), 310-319. 39. Kwan, S.C., and Strong, H.R. Index 47. Selinger, P.G., et al. Access path selec- 32. Gray, J.N., and Watson, V. A shared tion in a relational database management path length evaluation for the research stor- segment and inter-process communication system. Proc. ACM SIGMOD Conf., Boston, age system of System R. IBM Res. Rep. facility for VM/370. IBM Res. Rep. RJ1579, Mass., June 1979, pp. 23-34. RJ2736, San Jose, Calif., Jan. 1980. San Jose, Calif., Feb. 1975. 48. Stonebraker, M. Implementation of in- 33. Gray, J.N., Lorie, R.A., and Putzolu, 40. Lorie, R.A. X R M - - A n extended (N-ary) tegrity constraints and views by query modi- G.F. Granularity of locks in a large shared relational memory. IBM Tech. Rep. G320- fication. Tech. Memo ERL-M514, College of database. Proc. Conf. Very Large Data 2096, Cambridge Scientific Ctr., Cambridge, Eng., Univ. of Calif. at Berkeley, March Bases, Framingham, Mass., Sept. 1975, pp. Mass., Jan. 1974. 1975. 428-451. 49. Strong, H.R., Traiger, I.L., and Mar- 41. Lorie, R.A. Physical integrity in a large 34. Gray, J.N., Lorie, R.A., Putzolu, G.R., kowsky, G. Slide Search. IBM Res. Rep. segmented database. A C M Trans. Database and Traiger, I.L. Granularity of locks and Syst. 2, 1 (March 1977), 91-104. RJ2274, San Jose, Calif., June 1978. degrees of consistency in a shared data base. 50. Traiger, I.L., Gray J.N., Galtieri, C.A., Proc. IFIP Working Conf. Modelling of 42. Lorie, R.A., and Wade, B.W. The com- and Lindsay, B.G. Transactions and consis- Database Management Systems, Freuden- pilation of a high level data language. IBM tency in distributed database systems. IBM stadt, Germany, Jan. 1976, pp. 695-723 (also Res. Rep. RJ2598, San Jose, Calif., Aug. Res. Rep. RJ2555, San Jose, Calif., June IBM Res. Rep. RJ1654, San Jose, Calif.). 1979. 1979. 646 Communications October 1981 of zycnzj.com/http://www.zycnzj.com/ 24 Volume the ACM Number 10