Domain vs. (Data) Type, Class vs. Relation

Domain vs. (Data) Type, Class vs. Relation
"Our terminology is broken beyond repair. [Let me] point out some problems with
Date's use of terminology, specifically in two cases.
1. "type" = "domain": I fully understand why one might equate
"type" and "domain", but ... in today's programming practice,
"type" and "domain" are quite different. The word "type" is
largely tied to system-level (or "physical"-level) definitions of
data, while a "domain" is thought of as an abstract set of
acceptable values.
2. "class" ≠ "relvar": In simple terms, the word "class" applies to a
collection of values allowed by a predicate, regardless of whether
such a collection could actually exist. Every set has a
corresponding class, although a class may have no corresponding
set ... in mathematical logic, a "relation" is a "class" (and trivially
also a "set"), which contributes to confusion.
In modern programming parlance "class" is generally distinguished from
"type" only in that "type" refers to "primitive" (system-defined) data
definitions while "class" refers to higher-level (user-defined) data
definitions. This distinction is almost arbitrary, and in some contexts,
"type" and "class" are actually synonymous."
With respect to 1, well, yes, they are distinct, but not for the stated reason. With
respect to 2, well, no insofar as "programming parlance" goes. The terminology
introduced by Codd was explicitly intended to distinguish formal concepts from set
theory and first order predicate logic from the terminology used in programming
practice.
1. Domain vs. (Data) Type
"The theory behind data types in most programming languages is based
on abstract data types, but programmers hardly ever use the term in this
way and languages are rarely strong in this regard. The need for a formal
theory (of abstract data) and the semantics of types was not addressed by
either Codd or the current RDM interpretation. Codd's treatment of types
was greatly simplified and its understanding in the current interpretation
of the RDM is at best simplistic. An adequate treatment of the subject is
beyond the scope of this discussion and will be addressed in Part III of
LOGIC FOR SERIOUS DATABASE FOLKS". --David McGoveran
For our purposes here suffice it to say that type is used in two senses:

(a) Extensionally i.e., type denotes a specific set of typed object(s),
which define the type;
(b) Intensionally i.e., type defines what is and is not permissible for a
typed object.
Both relational domains and programming data types are types in the (a) sense:
sets of values within a specified range to which certain operations are applicable. In
his book THE RELATIONAL MODEL VERSION 2, Codd lists several distinctions of the
former (which he called "extended types") from the latter: domains
• are types with database designer-constrained value ranges;
• represent real world entity properties;
• are under DBMS control.
while programming types are under programmer and application control and do not
necessarily represent real world properties.
2. Relation vs. Class
"Whatever type and class are in "modern programming parlance", the
meanings of class in set theory (vs. any other usages) should not be
confused with how it is popularly used in programming or--for that
matter--in the database literature (class vs. type is another good example
of such confusion).
The distinctions between class and set vary with the specific version of
set theory. To avoid problems, we will use the most broadly applicable
definitions that will still apply to usages relevant to relational database
theory and will try to (1) be precise about how we use the terms and (2)
identify the subject areas to which the definitions do not apply." --David
McGoveran
In the real world
"...every property defines a class--namely, the set of [entities] possessing
that property--whereas every class is a class simply by virtue of the fact
that its members have common defining properties."--MEANING AND
ARGUMENT: ELEMENTS OF LOGIC
In other words, entities are members of a class by virtue of common properties and
when we say they are of the same type, we use type in the (b) sense.
"The definition of a class is intensional--it is a statement of the properties
that distinguish members of the class from non-members. When applied
to a particular universe of entities, a class definition selects out those that
are members of the class. If the universe is well defined--a collection of
entities in which each can, in principle though perhaps not in practical
terms, be examined--the result is a set. Mathematicians say that a class

over a universe "induces" a set. If one defines a class, one must then
"compute" the set that is induced when that class definition is applied to a
particular universe." --LOGIC FOR SERIOUS DATABASE FOLKS
At the class level by properties we mean:
• Individual properties shared by entities that are class members;
• Properties arising from relationships between individual properties;
• Properties arising from relationships among all class members collectively;
There are also multi-class properties arising from relationships among two or more
classes.
Note that while this seems to contradict "whether such a collection could actually
exist", it does not because of the caveat regarding "well defined universe". If the
collection could not actually exist, the universe is not well defined as required.
Conceptual modeling consists of specifying these relationships in natural language
as informal business rules. Those rules correspond to a formal predicate that
expresses the class i.e., they comprise the intensional definition of each class of
interest. When applied to a universe of entities, the class induces a set of class
members, facts about which are to be recorded in the database.
A relation is, thus, a set of tuples that represent in the database facts about the set
of entities induced by the class. Every relation is associated with a relation
predicate (RP)--the conjunction of integrity constraints that represent the business
rules in the database. The RP represents formally in the database the intensional
class definition (that was informally expressed by the business rules). When applied
to a universe of entities, that RP induces the relation and serves as its membership
function. The relation's tuples--its extension--satisfy that RP. This is another way
of saying the tuples in a relation represent facts about a set of entities of the same
type i.e., a RP is a relation type and a tuple type specification statement.
Note very carefully that:
"Translating business rules into a formal first order predicate (let alone
expressing it as integrity constraints in any DBMS-specific data
language) is a big step that casts the die. There is no way to know you've
done it incorrectly, except that you decide you are unhappy with the
results--that the formalism doesn't produce something you think it should
produce, or produces something you think it should not (usually detected
by translating the constraints backwards and comparing to reality). We
can minimize the likelihood of a bad modeling effort by following a
careful methodology, but we must not confuse the conceptual with its
formal representation, the former being the choice of subject matter and
latter being the result of a choice of formalism." --LOGIC FOR SERIOUS
DATABASE FOLKS
I shudder at comparing database practice to this recommendation.
Note also that, following Codd, we refer to relations rather than relvars.

"...set semantics do not have the concept of a computer variable to which
values can be destructively assigned (or "updated") ... [such] variables
can be expressed in certain systems of logic, but they cannot be expressed
in elementary set theory, or first order predicate logic. Other, more
expressively powerful systems are required. Unfortunately, such
powerful formal systems do violence to the relational data model and its
intended benefits." --LOGIC FOR SERIOUS DATABASE FOLKS
which is perhaps why Codd avoided relvars by using the term "time-varying
relations" instead. His choice seems to skirt the need for such powerful formal
systems, while relvars--which introduce the semantics of computationally complete
programming languages and the higher logic that they entail--embrace it.

Domain vs. (Data) Type, Class vs. Relation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Domain vs. (Data) Type, Class vs. Relation

Similar to Domain vs. (Data) Type, Class vs. Relation (20)

Recently uploaded

Recently uploaded (20)

Domain vs. (Data) Type, Class vs. Relation