SlideShare a Scribd company logo
Towards a New Data Modelling
Architecture
By Athanassios I. Hatzis, PhD, R&D Software Engineer
(C) 17th of March 2015
This is a series of Wolfram Language notebooks that introduce progressively the art of a new
innovative, exhilarating, data modelling approach to software developers, architects, data model
designers and everyone interested in learning the advantages of applying this method and the main
differences from the data models of the past. In Part I We start with terms and constructs that most
of us are familiar with from the relational database management systems and we continue with the
terminology and constructs of the new model in Part 2.
Part1: Representations and Transformations
on the Constructs of the Relational Model in
Wolfram Language
Scope
The entity-relational data model (ERDM) is still the most popular data model in database manage-
ment systems. There are several reason for that, but the main one is the simple and natural way of
managing data in tables with rows (records) and columns (attributes). On top of that SQL is a very
powerful and easy to learn programming language that covers completely the relational operators
on data sets. In this Mathematica notebook various methods of representing the basic constructs of
the relational model are demonstrated.
◆ Definitions
Product Type
ClearAll["Global`*"]
Needs["DatabaseLink`"]
conn = OpenSQLConnection[
JDBC["Microsoft Access(ODBC)", "C:TempSuppliesCatalogDB.mdb"]];
Wikipedia Extract
“In programming languages and type theory, a product of types is another, compounded, type in a
structure. The “operands” of the product are types, and the structure of a product type is determined
by the fixed order of the operands in the product. An instance of a product type retains the fixed
order, but otherwise may contain all possible instances of its primitive data types. The expression of
an instance of a product type will be a tuple, and is called a “tuple type” of expression. A product of
types is a direct product of two or more types.”
Example
Integer x String x Colour. In Wolfram Language an instance, p1, of such a type is represented with
a list abstract data type. In Wolfram Language we can write
partInstanceAsList = {991, "Left Handed Bacon Stretcher Cover", Red}
991, Left Handed Bacon Stretcher Cover, 
And to check/verify the type for each element of the list we apply the function Head to p1
Head /@ partInstanceAsList
{Integer, String, RGBColor}
Tuple or record or row
Wikipedia Extract
“A tuple is an ordered list of elements. In mathematics, an n-tuple is a sequence (or ordered list) of
n elements, where n is a non-negative integer. In computer science, tuples are directly implemented
as product types in most functional programming languages. More commonly, they are imple-
mented as record types, where the components are labeled instead of being identified by position
alone. This approach is also used in relational algebra.”
“In database theory, the relational model uses a tuple definition similar to tuples as functions, but
each tuple element is identified by a distinct name, called an attribute, instead of a number; this
leads to a more user-friendly and practical notation. A tuple in the relational model is formally
defined as a finite function that maps attributes to values. In this notation, attribute–value pairs may
appear in any order.”
Example
( partID : 991, partName : “Left Handed Bacon Stretcher Cover”, partColor : Red )
In Wolfram Language record abstract data structure is usually represented with the Assocation
function, i.e. a symbolically indexed list of rules (key->value pairs).
2 Towards a New Data Modelling Architecture - Part 1.nb
partInstanceAsAssociation = Association[partID → 991,
partName -> "Left Handed Bacon Stretcher Cover", partColor → Red]
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor → 
And we can take a list of values in the following way
Values[partInstanceAsAssociation]
991, Left Handed Bacon Stretcher Cover, 
or take a list of rules (key->value) pairs
partInstanceAsAssociation // Normal
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor → 
Attribute or field or column
Wikipedia Extract
“The basic relational building block is the domain or data type, usually abbreviated nowadays to
type. A tuple is an ordered set of attribute values. An attribute is an ordered pair of attribute name
and type name. An attribute value is a specific valid value for the type of the attribute. This can be
either a scalar value or a more complex type. A domain describes the set of possible values for a
given attribute, and can be considered a constraint on the value of the attribute. Mathematically,
attaching a domain to an attribute means that any value for the attribute must be an element of the
specified set. Constraints make it possible to further restrict the domain of an attribute.”
Example
In our example, two of our attributes partID, integer data type and partName, string data type take
scalar values. The partColor attribute is of complex type, RGBColor.
Apply[Rule,
Thread[{Keys[partInstanceAsAssociation], Head /@ partInstanceAsList}], {1}]
{partID → Integer, partName → String, partColor → RGBColor}
Important Notice
Attribute can be seen as a mapping function. It maps a tuple to a value. For example, what is the
color of the part. We can define a function where we pass a single argument which is the associa-
tion representation of the tuple and we return the specific value of the key.
Turn the keys of the association to functions, return the specific key of association instance as the
value of the attribute
isIdentifierOf[assoc_] := assoc[partID]
isNameOf[assoc_] := assoc[partName]
isColorOf[assoc_] := assoc[partColor]
Towards a New Data Modelling Architecture - Part 1.nb 3
{isIdentifierOf[partInstanceAsAssociation],
isNameOf[partInstanceAsAssociation], isColorOf[partInstanceAsAssociation]}
991, Left Handed Bacon Stretcher Cover, 
Relation (Base relval)
Wikipedia Extract
“In the relational model, a relation is a (possibly empty) finite set of tuples all having the same finite
set of attributes.This set of attributes is more formally called the sort of the relation, or more casually
referred to as the set of column names. A tuple is usually implemented as a row in a database
table. The fundamental assumption of the relational model is that all data is represented as mathe-
matical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In
the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning
there are two possible evaluations for each proposition: either true or false (and in particular no third
value such as unknown, or not applicable, either of which are often associated with the concept of
NULL). Data are operated upon by means of a relational calculus or relational algebra, these being
equivalent in expressive power.
A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a
set is an unordered collection of unique, non-duplicated items. A table is an accepted visual
representation of a relation; a tuple is similar to the concept of a row. It is a set of tuples sharing the
same attributes; a set of columns and rows. A relvar is a named variable of some specific relation
type, to which at all times some relation of that type is assigned, though the relation may contain
zero tuples.”
Predicates and the Closed World Assumption
“A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary
relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples.
The body of a relation is sometimes called its extension. This is because it is to be interpreted as a
representation of the extension of some predicate, this being the set of true propositions that can be
formed by replacing each free variable in that predicate by a name (a term that designates some-
thing). There is a one-to-one correspondence between the free variables of the predicate and the
attribute names of the relation heading. Each tuple of the relation body provides attribute values to
instantiate the predicate by substituting each of its free variables. The result is a proposition that is
deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise,
every tuple whose heading conforms to that of the relation, but which does not appear in the body is
deemed to be false. This assumption is known as the closed world assumption: it is often violated in
practical databases, where the absence of a tuple might mean that the truth of the corresponding
proposition is unknown.”
4 Towards a New Data Modelling Architecture - Part 1.nb
Example
SQLSelect[conn, "Parts", "ShowColumnHeadings" → True] // TableForm
pid pname pcolor
991 Left Handed Bacon Stretcher Cover Red
992 Smoke Shifter End Black
993 Acme Widget Washer Red
994 Acme Widget Washer Silver
995 I Brake for Crop Circles Sticker Translucent
996 Anti-Gravity Turbine Generator Cyan
997 Anti-Gravity Turbine Generator Magenta
998 Fire Hydrant Cap Red
999 7 Segment Display Green
SQLSelect[conn, "Parts", "ShowColumnHeadings" → True]
{{pid, pname, pcolor}, {991, Left Handed Bacon Stretcher Cover, Red},
{992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red},
{994, Acme Widget Washer, Silver},
{995, I Brake for Crop Circles Sticker, Translucent},
{996, Anti-Gravity Turbine Generator, Cyan},
{997, Anti-Gravity Turbine Generator, Magenta},
{998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}}
View or Result Set (Derived relvar)
Wikipedia Extract
“In a relational database, all data are stored and accessed via relations. Relations that store data
are called “base relations”, and in implementations are called “tables”. Other relations do not store
data, but are computed by applying relational operations to other relations. These relations are
sometimes called “derived relations”. In implementations these are called “views” or “queries”.”
Example
queryString = "
SELECT Catalog.catsid, Suppliers.sname,
Catalog.catpid, Parts.pname, Parts.pcolor, Catalog.catcost
FROM Suppliers INNER JOIN (Parts INNER JOIN [Catalog] ON Parts.pid
= Catalog.[catpid]) ON Suppliers.sid = Catalog.[catsid]
WHERE (((Catalog.catpid)=998))
ORDER BY Catalog.catcost;";
SQLExecute[conn, queryString, "ShowColumnHeadings" → True] // TableForm
catsid sname catpid pname pcolor catcost
1082 Big Red Tool and Die 998 Fire Hydrant Cap Red 7.95
1081 Acme Widget Suppliers 998 Fire Hydrant Cap Red 11.7
1083 Perfunctory Parts 998 Fire Hydrant Cap Red 12.5
1084 Alien Aircaft Inc. 998 Fire Hydrant Cap Red 48.6
Towards a New Data Modelling Architecture - Part 1.nb 5
Database
Wikipedia Extract
“Each database is a collection of related tables; these are also called relations, hence the name
“relational database”. Each table is a physical representation of an entity or object that is in a tabular
format consisting of columns and rows.”
Example
SQLTableNames[conn]
{Catalog, Parts, Suppliers}
SQLTableInformation[conn, "ShowColumnHeadings" → True] // TableForm
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
C:TempSuppliesCatalogDB.mdb Null Catalog TABLE Null
C:TempSuppliesCatalogDB.mdb Null Parts TABLE Null
C:TempSuppliesCatalogDB.mdb Null Suppliers TABLE Null
SQLTableNames[conn, "TableType" → SQLTableTypeNames[conn]]
{MSysAccessObjects, MSysAccessXML, MSysACEs, MSysIMEXColumns,
MSysIMEXSpecs, MSysNameMap, MSysNavPaneGroupCategories, MSysNavPaneGroups,
MSysNavPaneGroupToObjects, MSysNavPaneObjectIDs, MSysObjects, MSysQueries,
MSysRelationships, Catalog, Parts, Suppliers, View998Suppliers, ViewAll}
Entity-Relationship (ER) Modeling
Wikipedia Extract
“An entity–relationship model is a systematic way of describing and defining a business process.
The process is modeled as components (entities) that are linked with each other by relationships
that express the dependencies and requirements between them. Entities may have various proper-
ties (attributes) that characterize them. Diagrams created to represent these entities, attributes, and
relationships graphically are called entity–relationship diagrams.”
ER / Relational Terms Equivalence
Entity Type - Relation (Table, Base relvar)
Entity - Tuple
Attribute - Attribute (column)
Relationship - View (Result set or Derived relvar)
EER (Enhanced Entity-Relationship) Model
“The EER model includes all of the concepts introduced by the ER model. Additionally it includes
the concepts of a subclass and superclass (Is-a), along with the concepts of specialization and
generalization. Furthermore, it introduces the concept of a union type or category, which is used to
6 Towards a New Data Modelling Architecture - Part 1.nb
represent a collection of objects that is the union of objects of different entity types.”
Constrains
Wikipedia Extract
“Constraints provide one method of implementing business rules in the database. SQL implements
constraint functionality in the form of check constraints. Constraints restrict the data that can be
stored in relations. These are usually defined using expressions that result in a boolean value,
indicating whether or not the data satisfies the constraint.
Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an
entire relation. Since every attribute has an associated domain, there are constraints (domain
constraints). The two principal rules for the relational model are known as entity integrity and referen-
tial integrity.
In ER modeling we have two types of constrains that are placed on relationships, cardinality and
participation. A cardinality constraint places a limit on the number of relationships an entity may
participate in at any given time.”
◆ Table and Record in Wolfram Language
The basic construct of the ER model is the record, a table can be defined as a list of records.
Representation of a Record in Wolfram Language
List Representations
You need to maintain two ordered lists, one for the data values and another one for the semantics,
i.e. the attribute/column names.
partInstanceAsList
991, Left Handed Bacon Stretcher Cover, 
attributes = {"pid", "pname", "pcolor"}
{pid, pname, pcolor}
But you can combine the two lists in one list of rules with the following command
Thread[Rule[attributes, partInstanceAsList]]
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → 
A Rule is the equivalent of a key-value pair, but it is more powerful because in Wolfram Language it
is the basic mechanism that is used in transformations. Nevertheless for lookup operations and
updating Wolfram researchers added a more powerful construct that is called Association, see
below.
Towards a New Data Modelling Architecture - Part 1.nb 7
Graph Representation with triplets (RDF)
Let us call a specific part instance partXYZ, if we represent this as the subject resource of a triplet,
the list of attributes as the predicates and the list of values as the objects we can take the following
triplets adding example namespaces
subject = Table["http://example.org/resource/partXYZ", {3}];
predicate = StringJoin["http://example.org/attribute/", #] & /@ attributes;
object = partInstanceAsList;
Transpose[{subject, predicate, object}] // TableForm
http://example.org/resource/partXYZ http://example.org/attribute/pid 991
http://example.org/resource/partXYZ http://example.org/attribute/pname Left
http://example.org/resource/partXYZ http://example.org/attribute/pcolor
Directed Graph
graphData = DirectedEdge[#, partXYZ] & /@ Values[partInstanceAsAssociation]
991  partXYZ, Left Handed Bacon Stretcher Cover  partXYZ,  partXYZ
vstyle =
{# → Black & /@ Values[partInstanceAsAssociation], partXYZ → Red} // Flatten
991 → , Left Handed Bacon Stretcher Cover → , → , partXYZ → 
elabels = Apply[Rule, Thread[{graphData, Keys[partInstanceAsAssociation]}], {1}]
991  partXYZ → partID,
Left Handed Bacon Stretcher Cover  partXYZ → partName,  partXYZ → partColor
8 Towards a New Data Modelling Architecture - Part 1.nb
Graph[graphData,
VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name",
EdgeStyle → Thick, EdgeLabels -> elabels,
EdgeShapeFunction → GraphElementData[{"CarvedArrow", "ArrowSize" → .06}],
GraphLayout → "SpringEmbedding"]
partID
partName
partColor
991
partXYZ
Left Handed Bacon Stretcher Cover
We can observe two types of nodes in this kind of graph, URI (RED) node and Literal (BLACK)
nodes.
TreeForm Representation
From the discussion above we saw two abstract data structures that are suitable for tuple representa-
tion: the List and the Associative array also known as Map or Dictionary. The equivalent representa-
tions in Wolfram Language are the List and the Association functions. In fact, Wolfram Language
functions are tree data structures that are created in the memory as a contiguous array of pointers,
the first to the head and the rest to its successive elements. Take for example the list we defined,
partInstanceAsList, we can present it in a tree form:
partInstanceAsList // TreeForm
List
991 Left Handed Bacon Stretcher Cover RGBColor
1 0 0
That reveals that the symbol Red is set to the result of the function RGBColor with parameters 1,0,0
RGBColor[1, 0, 0]
Red
Towards a New Data Modelling Architecture - Part 1.nb 9
Function Representation
We can represent this triplet, partInstanceAsList as a function with three arguments that take
values from the Integer, String and Color domains
partFunction[991, "Left Handed Bacon Stretcher Cover", Red] // TreeForm
partFunction
991 Left Handed Bacon Stretcher Cover RGBColor
1 0 0
Association Representation
Associations in Wolfram Language are very similar to the Association Type construct of the Topic
Map data model. Each defined association is an instance of an association type, e.g. partIn-
stanceAsAssociation, the type of a supplies part. The keys of the association, association role
type according to Topic Maps terminology, describe the role type of the values in the association
instance. The values of the association, association role players according to Topic Maps terminol-
ogy, describe the particular instance of the association type. In the following example, we have
three role players, the values 991, “Left Handed Bacon Stretcher Cover” and Red that describe a
part from a supplies catalog database. Each player has a role type that is important for semantic
purposes. The integer number 991, is used as an identifier for the part, its role type is that of an
identifier. The string “Left Handed Bacon Stretcher Cover” is used as a name descriptor and the
symbol Red is used as the value of the categorical variable color to describe the color of the part.
The command to perform the association of attributes with their values is the following
AssociationThread[attributes → partInstanceAsList]
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → 
Association is a relatively new fundamental construct in Wolfram Language, it acts like a symboli-
cally indexed list. The main reason for using it is to allow highly efficient lookup and updating and
also build complex hierarchical structures and other datasets.
You can easily convert an Association to a list of rules
% // Normal
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → 
partInstanceAsAssociation
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor → 
10 Towards a New Data Modelling Architecture - Part 1.nb
Keys[partInstanceAsAssociation]
{partID, partName, partColor}
Values[partInstanceAsAssociation]
991, Left Handed Bacon Stretcher Cover, 
HyperGraph Representation
Although, it is possible to represent a hypergraph by a bipartite graph and follow the RDF (Subject-
Predicate-Object) approach I would like to demonstrate a different perspective that eventually leads
to what I named the enhanced hypergraph representation of the relational model that follows in
Part 2 of this series.
We will use two lists of rules, one for the attributes and another for the values. Essentially we follow
the same principle as with the lists above, we separate the semantics from the data values, i.e.
types from instances. This is a critical step in a completely new way of thinking about the data
modeling process.
List of Rules for Attributes
Take the list of attributes first and add in the beginning of the list another one which we call nexus-
Part. This is the hyperedge that connects all the other attributes together.
Insert[attributes, "nexusPart", 1]
{nexusPart, pid, pname, pcolor}
and using a list of rules we take
conceptsRules = Thread[Rule[
Table["nexusPart", {3}],
attributes]]
{nexusPart → pid, nexusPart → pname, nexusPart → pcolor}
vstyle = {# → Black & /@ attributes, "nexusPart" → Red} // Flatten
pid → , pname → , pcolor → , nexusPart → 
A graphical representation of the list above is the following
Graph[conceptsRules, VertexSize → Medium,
VertexStyle → vstyle, VertexLabels → "Name"]
nexusPart
pid pname pcolor
Towards a New Data Modelling Architecture - Part 1.nb 11
So, in this hypergraph the nexusPart plays the role of the hyperedge with a red color, that connects
three hypernodes that represent the attributes (pid, pname, pcol) with a black color.
List of Rules for Values
Now we can proceed with the data
partInstanceAsList
991, Left Handed Bacon Stretcher Cover, 
dataRules = Thread[Rule[
Table["nexus991", {3}],
partInstanceAsList]]
nexus991 → 991, nexus991 → Left Handed Bacon Stretcher Cover, nexus991 → 
vstyle = {# → Blue & /@ partInstanceAsList, "nexus991" → Green} // Flatten
991 → , Left Handed Bacon Stretcher Cover → , → , nexus991 → 
A graphical representation of the list above is the following
Graph[dataRules, VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name"]
nexus991
991 Left Handed Bacon Stretcher Cover
And in this hypergraph the nexus991 plays the role of a hyperedge with a green color, that connects
three hypernodes that represent the values (991, “Left Handled....”, RED) with a blue color.
Summary
We created two handles, we call them nexuses, one at a layer of concepts to represent the head of
the record and another at the data layer to represent the body of the record. Provided that we find a
way to connect the two layers, we are now in a position to create concept graphs, we will call them
maps from now on, that are similar to the ER diagrams that database designers build in RDBMS.
Property Graph Representation
In the property graph representation, each record becomes an instance of a class that is repre-
sented graphically with a node. Attributes of the record are usually embedded in the structure of the
node as properties of the class. If we follow our previous example we can take the association
representation
12 Towards a New Data Modelling Architecture - Part 1.nb
partInstanceAsAssociation
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor → 
and add an extra handler that refers to that association and represents a key for looking up associa-
tions. For example:
"recKey991" → partInstanceAsAssociation
recKey991 →
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor → 
This way the same key can be used to maintain an association of biderectional links to other
records through a specific edge. For example part 991 has two suppliers 1081 and 1082
"recKey991" →
"recKey1082" -> "isSupplierOf991", "recKey1081" → "isSupplierOf991"
recKey991 → recKey1082 → isSupplierOf991, recKey1081 → isSupplierOf991
Representation of a Table in Wolfram Language
As a List of Lists
Wolfram Language provides experessions that represent many SQL constructs such as Table
(SQLTable, SQLTables, SQLTableNames, SQLTableInformation) and Column (SQLColumn,
SQLColumns, SQLColumnNames, SQLColumnInformation). These commands get information
about the structure of these constructs.
There are also two styles of commands for working with data: Wolfram Language SQL commands,
SQLSelect, SQLUpdate, SQLInsert, etc, for those who are familiar with the language and execution
of SQL-Style query commands using SQLExecute statement.
partsList = SQLSelect[conn, "Parts", "ShowColumnHeadings" → True]
{{pid, pname, pcolor}, {991, Left Handed Bacon Stretcher Cover, Red},
{992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red},
{994, Acme Widget Washer, Silver},
{995, I Brake for Crop Circles Sticker, Translucent},
{996, Anti-Gravity Turbine Generator, Cyan},
{997, Anti-Gravity Turbine Generator, Magenta},
{998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}}
partsList // TableForm
pid pname pcolor
991 Left Handed Bacon Stretcher Cover Red
992 Smoke Shifter End Black
993 Acme Widget Washer Red
994 Acme Widget Washer Silver
995 I Brake for Crop Circles Sticker Translucent
996 Anti-Gravity Turbine Generator Cyan
997 Anti-Gravity Turbine Generator Magenta
998 Fire Hydrant Cap Red
999 7 Segment Display Green
suppliersList = SQLSelect[conn, "Suppliers", "ShowColumnHeadings" → True];
Towards a New Data Modelling Architecture - Part 1.nb 13
catalogList = SQLSelect[conn, "Catalog", "ShowColumnHeadings" → True];
As a List of Associations or Dataset Construct
List of Associations
Here we demonstrate how we can arrive to a list of associations from a list of lists of data in the
previous section. First we split the header, column attributes, from the body, records of data.
attributes = partsList[[1]]
{pid, pname, pcolor}
data = partsList[[2 ;;]]
{{991, Left Handed Bacon Stretcher Cover, Red}, {992, Smoke Shifter End, Black},
{993, Acme Widget Washer, Red}, {994, Acme Widget Washer, Silver},
{995, I Brake for Crop Circles Sticker, Translucent},
{996, Anti-Gravity Turbine Generator, Cyan},
{997, Anti-Gravity Turbine Generator, Magenta},
{998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}}
Now we are in a position to create the first association
AssociationThread[attributes → data[[1]]]
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → Red
Then generalize the method to transofrm the list of records to a list of associations
AssociationThread[attributes -> #] & /@ data // TableForm
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → Red
pid → 992, pname → Smoke Shifter End, pcolor → Black
pid → 993, pname → Acme Widget Washer, pcolor → Red
pid → 994, pname → Acme Widget Washer, pcolor → Silver
pid → 995, pname → I Brake for Crop Circles Sticker, pcolor → Translucent
pid → 996, pname → Anti-Gravity Turbine Generator, pcolor → Cyan
pid → 997, pname → Anti-Gravity Turbine Generator, pcolor → Magenta
pid → 998, pname → Fire Hydrant Cap, pcolor → Red
pid → 999, pname → 7 Segment Display, pcolor → Green
The Structured Dataset Construct of Wolfram Language
Finally with the special Dataset construct that the Wolfram Language provides we can take a
dataset. Dataset has the interesting property that can represent not only multidimensional arrays of
data, but also data with arbitrary hierarchical structure. The second interesting property is that we
can apply various operators such as part, filtering, aggregation, subquery and arbitrary functions
directly on the dataset. A few examples to demonstrate these:
14 Towards a New Data Modelling Architecture - Part 1.nb
partsDataset = Dataset[AssociationThread[attributes -> #] & /@ data]
pid pname pcolor
991 Left Handed Bacon Stretcher Cover Red
992 Smoke Shifter End Black
993 Acme Widget Washer Red
994 Acme Widget Washer Silver
995 I Brake for Crop Circles Sticker Translucent
996 Anti-Gravity Turbine Generator Cyan
997 Anti-Gravity Turbine Generator Magenta
998 Fire Hydrant Cap Red
999 7 Segment Display Green
2 levels 27 elements
suppliersDataset =
Dataset[AssociationThread[suppliersList[[1]] → #] & /@ suppliersList[[2 ;;]]]
sid sname saddress
1081 Acme Widget Suppliers 1 Grub St., Potemkin Village, IL 61801
1082 Big Red Tool and Die 4 My Way, Bermuda Shorts, OR 90305
1083 Perfunctory Parts 99999 Short Pier, Terra Del Fuego, TX 41299
1084 Alien Aircaft Inc. 2 Groom Lake, Rachel, NV 51902
2 levels 12 elements
Towards a New Data Modelling Architecture - Part 1.nb 15
catalogDataset =
Dataset[AssociationThread[catalogList[[1]] → #] & /@ catalogList[[2 ;;]]]
catsid catpid catcost
1081 991 36.1
1081 992 42.3
1081 993 15.3
1081 994 20.5
1081 995 20.5
1081 996 124.2
1081 997 124.2
1081 998 11.7
1081 999 75.2
1082 991 16.5
1082 997 0.55
1082 998 7.95
1083 998 12.5
1083 999 1.
1084 994 57.3
1084 995 22.2
⋮
1
2 levels 51 elements
Select[catalogDataset, #catpid ⩵ 998 &];
Or equally
catalogDataset[Select[#catpid ⩵ 998 &]]
catsid catpid catcost
1081 998 11.7
1082 998 7.95
1083 998 12.5
1084 998 48.6
2 levels 12 elements
catalogDataset[GroupBy[Key["catpid"]]];
Or equally
16 Towards a New Data Modelling Architecture - Part 1.nb
catalogDataset[GroupBy[{#catpid} &]]
{991} {catsid → 1081, catpid → 991, catcost → 36.1, catsid → 1082, catpid → 991, catcost → 16.5
{992} {catsid → 1081, catpid → 992, catcost → 42.3}
{993} {catsid → 1081, catpid → 993, catcost → 15.3}
{994} {catsid → 1081, catpid → 994, catcost → 20.5, catsid → 1084, catpid → 994, catcost → 57.3
{995} {catsid → 1081, catpid → 995, catcost → 20.5, catsid → 1084, catpid → 995, catcost → 22.2
{996} {catsid → 1081, catpid → 996, catcost → 124.2}
{997} {catsid → 1081, catpid → 997, catcost → 124.2, catsid → 1082, catpid → 997, catcost → 0.55
{998} {catsid → 1081, catpid → 998, catcost → 11.7, catsid → 1082, catpid → 998, catcost → 7.95
{999} {catsid → 1081, catpid → 999, catcost → 75.2, catsid → 1083, catpid → 999, catcost → 1.
3 levels 51 elements
Towards a New Data Modelling Architecture - Part 1.nb 17

More Related Content

What's hot

An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
Computer Science Journals
 
DBMS _Relational model
DBMS _Relational modelDBMS _Relational model
DBMS _Relational model
Azizul Mamun
 
introduction of database in DBMS
introduction of database in DBMSintroduction of database in DBMS
introduction of database in DBMS
AbhishekRajpoot8
 
08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMS08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMSkoolkampus
 
An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...
eSAT Publishing House
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
Nishant Munjal
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
Chirag vasava
 
New open document text (2)
New open document text (2)New open document text (2)
New open document text (2)Samron Samantha
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
Nimmi Weeraddana
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
LakshmiSarvani6
 
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON DatatypePapers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Max Klymyshyn
 
Dbms important questions and answers
Dbms important questions and answersDbms important questions and answers
Dbms important questions and answers
LakshmiSarvani6
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
Anjaan Gajendra
 
Data structures - Introduction
Data structures - IntroductionData structures - Introduction
Data structures - Introduction
DeepaThirumurugan
 
Database model BY ME
Database model BY MEDatabase model BY ME
Database model BY ME
cristina jane penaso
 
DBMS_Ch1
 DBMS_Ch1 DBMS_Ch1
DBMS_Ch1
Azizul Mamun
 
Lesson 1 overview
Lesson 1   overviewLesson 1   overview
Lesson 1 overview
MLG College of Learning, Inc
 
Programming in C
Programming in CProgramming in C
Programming in C
MalathiNagarajan20
 

What's hot (19)

An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
DBMS _Relational model
DBMS _Relational modelDBMS _Relational model
DBMS _Relational model
 
introduction of database in DBMS
introduction of database in DBMSintroduction of database in DBMS
introduction of database in DBMS
 
08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMS08. Object Oriented Database in DBMS
08. Object Oriented Database in DBMS
 
An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
New open document text (2)
New open document text (2)New open document text (2)
New open document text (2)
 
Object oriented data model
Object oriented data modelObject oriented data model
Object oriented data model
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON DatatypePapers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
 
Dbms important questions and answers
Dbms important questions and answersDbms important questions and answers
Dbms important questions and answers
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
Data structures - Introduction
Data structures - IntroductionData structures - Introduction
Data structures - Introduction
 
Database model BY ME
Database model BY MEDatabase model BY ME
Database model BY ME
 
DBMS_Ch1
 DBMS_Ch1 DBMS_Ch1
DBMS_Ch1
 
Lesson 1 overview
Lesson 1   overviewLesson 1   overview
Lesson 1 overview
 
Programming in C
Programming in CProgramming in C
Programming in C
 

Similar to Towards a New Data Modelling Architecture - Part 1

Relational Model
Relational ModelRelational Model
Relational Model
A. S. M. Shafi
 
Ch 1 intriductions
Ch 1 intriductionsCh 1 intriductions
Ch 1 intriductionsirshad17
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
University of Salerno
 
Lecture 09.pptx
Lecture 09.pptxLecture 09.pptx
Lecture 09.pptx
Mohammad Hassan
 
Programming in Scala - Lecture Three
Programming in Scala - Lecture ThreeProgramming in Scala - Lecture Three
Programming in Scala - Lecture Three
Angelo Corsaro
 
Linked List Static and Dynamic Memory Allocation
Linked List Static and Dynamic Memory AllocationLinked List Static and Dynamic Memory Allocation
Linked List Static and Dynamic Memory Allocation
Prof Ansari
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
irjes
 
CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?
Marco Benini
 
Chapter 2.2 data structures
Chapter 2.2 data structuresChapter 2.2 data structures
Chapter 2.2 data structuressshhzap
 
Unit 3
Unit 3Unit 3
Sharbani bhattacharya VB Structures
Sharbani bhattacharya VB StructuresSharbani bhattacharya VB Structures
Sharbani bhattacharya VB Structures
Sharbani Bhattacharya
 
SIT221 Data Structures and Algorithms     Trimester 2, 2019 .docx
SIT221 Data Structures and Algorithms     Trimester 2, 2019 .docxSIT221 Data Structures and Algorithms     Trimester 2, 2019 .docx
SIT221 Data Structures and Algorithms     Trimester 2, 2019 .docx
edgar6wallace88877
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
Barry DeCicco
 
Improve Your Edge on Machine Learning - Day 1.pptx
Improve Your Edge on Machine Learning - Day 1.pptxImprove Your Edge on Machine Learning - Day 1.pptx
Improve Your Edge on Machine Learning - Day 1.pptx
CatherineVania1
 
Part2- The Atomic Information Resource
Part2- The Atomic Information ResourcePart2- The Atomic Information Resource
Part2- The Atomic Information ResourceJEAN-MICHEL LETENNIER
 
Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3
OllieShoresna
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
dickonsondorris
 

Similar to Towards a New Data Modelling Architecture - Part 1 (20)

Relational Model
Relational ModelRelational Model
Relational Model
 
Ch 1 intriductions
Ch 1 intriductionsCh 1 intriductions
Ch 1 intriductions
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Lecture 09.pptx
Lecture 09.pptxLecture 09.pptx
Lecture 09.pptx
 
Programming in Scala - Lecture Three
Programming in Scala - Lecture ThreeProgramming in Scala - Lecture Three
Programming in Scala - Lecture Three
 
Linked List Static and Dynamic Memory Allocation
Linked List Static and Dynamic Memory AllocationLinked List Static and Dynamic Memory Allocation
Linked List Static and Dynamic Memory Allocation
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?CORCON2014: Does programming really need data structures?
CORCON2014: Does programming really need data structures?
 
Chapter 2.2 data structures
Chapter 2.2 data structuresChapter 2.2 data structures
Chapter 2.2 data structures
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Unit 3
Unit 3Unit 3
Unit 3
 
Sharbani bhattacharya VB Structures
Sharbani bhattacharya VB StructuresSharbani bhattacharya VB Structures
Sharbani bhattacharya VB Structures
 
SIT221 Data Structures and Algorithms     Trimester 2, 2019 .docx
SIT221 Data Structures and Algorithms     Trimester 2, 2019 .docxSIT221 Data Structures and Algorithms     Trimester 2, 2019 .docx
SIT221 Data Structures and Algorithms     Trimester 2, 2019 .docx
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
Improve Your Edge on Machine Learning - Day 1.pptx
Improve Your Edge on Machine Learning - Day 1.pptxImprove Your Edge on Machine Learning - Day 1.pptx
Improve Your Edge on Machine Learning - Day 1.pptx
 
DS_PPT.pptx
DS_PPT.pptxDS_PPT.pptx
DS_PPT.pptx
 
Part2- The Atomic Information Resource
Part2- The Atomic Information ResourcePart2- The Atomic Information Resource
Part2- The Atomic Information Resource
 
Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3
 
FinalReport
FinalReportFinalReport
FinalReport
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 

Towards a New Data Modelling Architecture - Part 1

  • 1. Towards a New Data Modelling Architecture By Athanassios I. Hatzis, PhD, R&D Software Engineer (C) 17th of March 2015 This is a series of Wolfram Language notebooks that introduce progressively the art of a new innovative, exhilarating, data modelling approach to software developers, architects, data model designers and everyone interested in learning the advantages of applying this method and the main differences from the data models of the past. In Part I We start with terms and constructs that most of us are familiar with from the relational database management systems and we continue with the terminology and constructs of the new model in Part 2. Part1: Representations and Transformations on the Constructs of the Relational Model in Wolfram Language Scope The entity-relational data model (ERDM) is still the most popular data model in database manage- ment systems. There are several reason for that, but the main one is the simple and natural way of managing data in tables with rows (records) and columns (attributes). On top of that SQL is a very powerful and easy to learn programming language that covers completely the relational operators on data sets. In this Mathematica notebook various methods of representing the basic constructs of the relational model are demonstrated. ◆ Definitions Product Type ClearAll["Global`*"] Needs["DatabaseLink`"]
  • 2. conn = OpenSQLConnection[ JDBC["Microsoft Access(ODBC)", "C:TempSuppliesCatalogDB.mdb"]]; Wikipedia Extract “In programming languages and type theory, a product of types is another, compounded, type in a structure. The “operands” of the product are types, and the structure of a product type is determined by the fixed order of the operands in the product. An instance of a product type retains the fixed order, but otherwise may contain all possible instances of its primitive data types. The expression of an instance of a product type will be a tuple, and is called a “tuple type” of expression. A product of types is a direct product of two or more types.” Example Integer x String x Colour. In Wolfram Language an instance, p1, of such a type is represented with a list abstract data type. In Wolfram Language we can write partInstanceAsList = {991, "Left Handed Bacon Stretcher Cover", Red} 991, Left Handed Bacon Stretcher Cover,  And to check/verify the type for each element of the list we apply the function Head to p1 Head /@ partInstanceAsList {Integer, String, RGBColor} Tuple or record or row Wikipedia Extract “A tuple is an ordered list of elements. In mathematics, an n-tuple is a sequence (or ordered list) of n elements, where n is a non-negative integer. In computer science, tuples are directly implemented as product types in most functional programming languages. More commonly, they are imple- mented as record types, where the components are labeled instead of being identified by position alone. This approach is also used in relational algebra.” “In database theory, the relational model uses a tuple definition similar to tuples as functions, but each tuple element is identified by a distinct name, called an attribute, instead of a number; this leads to a more user-friendly and practical notation. A tuple in the relational model is formally defined as a finite function that maps attributes to values. In this notation, attribute–value pairs may appear in any order.” Example ( partID : 991, partName : “Left Handed Bacon Stretcher Cover”, partColor : Red ) In Wolfram Language record abstract data structure is usually represented with the Assocation function, i.e. a symbolically indexed list of rules (key->value pairs). 2 Towards a New Data Modelling Architecture - Part 1.nb
  • 3. partInstanceAsAssociation = Association[partID → 991, partName -> "Left Handed Bacon Stretcher Cover", partColor → Red] partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →  And we can take a list of values in the following way Values[partInstanceAsAssociation] 991, Left Handed Bacon Stretcher Cover,  or take a list of rules (key->value) pairs partInstanceAsAssociation // Normal partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →  Attribute or field or column Wikipedia Extract “The basic relational building block is the domain or data type, usually abbreviated nowadays to type. A tuple is an ordered set of attribute values. An attribute is an ordered pair of attribute name and type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type. A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an element of the specified set. Constraints make it possible to further restrict the domain of an attribute.” Example In our example, two of our attributes partID, integer data type and partName, string data type take scalar values. The partColor attribute is of complex type, RGBColor. Apply[Rule, Thread[{Keys[partInstanceAsAssociation], Head /@ partInstanceAsList}], {1}] {partID → Integer, partName → String, partColor → RGBColor} Important Notice Attribute can be seen as a mapping function. It maps a tuple to a value. For example, what is the color of the part. We can define a function where we pass a single argument which is the associa- tion representation of the tuple and we return the specific value of the key. Turn the keys of the association to functions, return the specific key of association instance as the value of the attribute isIdentifierOf[assoc_] := assoc[partID] isNameOf[assoc_] := assoc[partName] isColorOf[assoc_] := assoc[partColor] Towards a New Data Modelling Architecture - Part 1.nb 3
  • 4. {isIdentifierOf[partInstanceAsAssociation], isNameOf[partInstanceAsAssociation], isColorOf[partInstanceAsAssociation]} 991, Left Handed Bacon Stretcher Cover,  Relation (Base relval) Wikipedia Extract “In the relational model, a relation is a (possibly empty) finite set of tuples all having the same finite set of attributes.This set of attributes is more formally called the sort of the relation, or more casually referred to as the set of column names. A tuple is usually implemented as a row in a database table. The fundamental assumption of the relational model is that all data is represented as mathe- matical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL). Data are operated upon by means of a relational calculus or relational algebra, these being equivalent in expressive power. A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of unique, non-duplicated items. A table is an accepted visual representation of a relation; a tuple is similar to the concept of a row. It is a set of tuples sharing the same attributes; a set of columns and rows. A relvar is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.” Predicates and the Closed World Assumption “A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples. The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of the extension of some predicate, this being the set of true propositions that can be formed by replacing each free variable in that predicate by a name (a term that designates some- thing). There is a one-to-one correspondence between the free variables of the predicate and the attribute names of the relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation, but which does not appear in the body is deemed to be false. This assumption is known as the closed world assumption: it is often violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding proposition is unknown.” 4 Towards a New Data Modelling Architecture - Part 1.nb
  • 5. Example SQLSelect[conn, "Parts", "ShowColumnHeadings" → True] // TableForm pid pname pcolor 991 Left Handed Bacon Stretcher Cover Red 992 Smoke Shifter End Black 993 Acme Widget Washer Red 994 Acme Widget Washer Silver 995 I Brake for Crop Circles Sticker Translucent 996 Anti-Gravity Turbine Generator Cyan 997 Anti-Gravity Turbine Generator Magenta 998 Fire Hydrant Cap Red 999 7 Segment Display Green SQLSelect[conn, "Parts", "ShowColumnHeadings" → True] {{pid, pname, pcolor}, {991, Left Handed Bacon Stretcher Cover, Red}, {992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red}, {994, Acme Widget Washer, Silver}, {995, I Brake for Crop Circles Sticker, Translucent}, {996, Anti-Gravity Turbine Generator, Cyan}, {997, Anti-Gravity Turbine Generator, Magenta}, {998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}} View or Result Set (Derived relvar) Wikipedia Extract “In a relational database, all data are stored and accessed via relations. Relations that store data are called “base relations”, and in implementations are called “tables”. Other relations do not store data, but are computed by applying relational operations to other relations. These relations are sometimes called “derived relations”. In implementations these are called “views” or “queries”.” Example queryString = " SELECT Catalog.catsid, Suppliers.sname, Catalog.catpid, Parts.pname, Parts.pcolor, Catalog.catcost FROM Suppliers INNER JOIN (Parts INNER JOIN [Catalog] ON Parts.pid = Catalog.[catpid]) ON Suppliers.sid = Catalog.[catsid] WHERE (((Catalog.catpid)=998)) ORDER BY Catalog.catcost;"; SQLExecute[conn, queryString, "ShowColumnHeadings" → True] // TableForm catsid sname catpid pname pcolor catcost 1082 Big Red Tool and Die 998 Fire Hydrant Cap Red 7.95 1081 Acme Widget Suppliers 998 Fire Hydrant Cap Red 11.7 1083 Perfunctory Parts 998 Fire Hydrant Cap Red 12.5 1084 Alien Aircaft Inc. 998 Fire Hydrant Cap Red 48.6 Towards a New Data Modelling Architecture - Part 1.nb 5
  • 6. Database Wikipedia Extract “Each database is a collection of related tables; these are also called relations, hence the name “relational database”. Each table is a physical representation of an entity or object that is in a tabular format consisting of columns and rows.” Example SQLTableNames[conn] {Catalog, Parts, Suppliers} SQLTableInformation[conn, "ShowColumnHeadings" → True] // TableForm TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS C:TempSuppliesCatalogDB.mdb Null Catalog TABLE Null C:TempSuppliesCatalogDB.mdb Null Parts TABLE Null C:TempSuppliesCatalogDB.mdb Null Suppliers TABLE Null SQLTableNames[conn, "TableType" → SQLTableTypeNames[conn]] {MSysAccessObjects, MSysAccessXML, MSysACEs, MSysIMEXColumns, MSysIMEXSpecs, MSysNameMap, MSysNavPaneGroupCategories, MSysNavPaneGroups, MSysNavPaneGroupToObjects, MSysNavPaneObjectIDs, MSysObjects, MSysQueries, MSysRelationships, Catalog, Parts, Suppliers, View998Suppliers, ViewAll} Entity-Relationship (ER) Modeling Wikipedia Extract “An entity–relationship model is a systematic way of describing and defining a business process. The process is modeled as components (entities) that are linked with each other by relationships that express the dependencies and requirements between them. Entities may have various proper- ties (attributes) that characterize them. Diagrams created to represent these entities, attributes, and relationships graphically are called entity–relationship diagrams.” ER / Relational Terms Equivalence Entity Type - Relation (Table, Base relvar) Entity - Tuple Attribute - Attribute (column) Relationship - View (Result set or Derived relvar) EER (Enhanced Entity-Relationship) Model “The EER model includes all of the concepts introduced by the ER model. Additionally it includes the concepts of a subclass and superclass (Is-a), along with the concepts of specialization and generalization. Furthermore, it introduces the concept of a union type or category, which is used to 6 Towards a New Data Modelling Architecture - Part 1.nb
  • 7. represent a collection of objects that is the union of objects of different entity types.” Constrains Wikipedia Extract “Constraints provide one method of implementing business rules in the database. SQL implements constraint functionality in the form of check constraints. Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation. Since every attribute has an associated domain, there are constraints (domain constraints). The two principal rules for the relational model are known as entity integrity and referen- tial integrity. In ER modeling we have two types of constrains that are placed on relationships, cardinality and participation. A cardinality constraint places a limit on the number of relationships an entity may participate in at any given time.” ◆ Table and Record in Wolfram Language The basic construct of the ER model is the record, a table can be defined as a list of records. Representation of a Record in Wolfram Language List Representations You need to maintain two ordered lists, one for the data values and another one for the semantics, i.e. the attribute/column names. partInstanceAsList 991, Left Handed Bacon Stretcher Cover,  attributes = {"pid", "pname", "pcolor"} {pid, pname, pcolor} But you can combine the two lists in one list of rules with the following command Thread[Rule[attributes, partInstanceAsList]] pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor →  A Rule is the equivalent of a key-value pair, but it is more powerful because in Wolfram Language it is the basic mechanism that is used in transformations. Nevertheless for lookup operations and updating Wolfram researchers added a more powerful construct that is called Association, see below. Towards a New Data Modelling Architecture - Part 1.nb 7
  • 8. Graph Representation with triplets (RDF) Let us call a specific part instance partXYZ, if we represent this as the subject resource of a triplet, the list of attributes as the predicates and the list of values as the objects we can take the following triplets adding example namespaces subject = Table["http://example.org/resource/partXYZ", {3}]; predicate = StringJoin["http://example.org/attribute/", #] & /@ attributes; object = partInstanceAsList; Transpose[{subject, predicate, object}] // TableForm http://example.org/resource/partXYZ http://example.org/attribute/pid 991 http://example.org/resource/partXYZ http://example.org/attribute/pname Left http://example.org/resource/partXYZ http://example.org/attribute/pcolor Directed Graph graphData = DirectedEdge[#, partXYZ] & /@ Values[partInstanceAsAssociation] 991  partXYZ, Left Handed Bacon Stretcher Cover  partXYZ,  partXYZ vstyle = {# → Black & /@ Values[partInstanceAsAssociation], partXYZ → Red} // Flatten 991 → , Left Handed Bacon Stretcher Cover → , → , partXYZ →  elabels = Apply[Rule, Thread[{graphData, Keys[partInstanceAsAssociation]}], {1}] 991  partXYZ → partID, Left Handed Bacon Stretcher Cover  partXYZ → partName,  partXYZ → partColor 8 Towards a New Data Modelling Architecture - Part 1.nb
  • 9. Graph[graphData, VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name", EdgeStyle → Thick, EdgeLabels -> elabels, EdgeShapeFunction → GraphElementData[{"CarvedArrow", "ArrowSize" → .06}], GraphLayout → "SpringEmbedding"] partID partName partColor 991 partXYZ Left Handed Bacon Stretcher Cover We can observe two types of nodes in this kind of graph, URI (RED) node and Literal (BLACK) nodes. TreeForm Representation From the discussion above we saw two abstract data structures that are suitable for tuple representa- tion: the List and the Associative array also known as Map or Dictionary. The equivalent representa- tions in Wolfram Language are the List and the Association functions. In fact, Wolfram Language functions are tree data structures that are created in the memory as a contiguous array of pointers, the first to the head and the rest to its successive elements. Take for example the list we defined, partInstanceAsList, we can present it in a tree form: partInstanceAsList // TreeForm List 991 Left Handed Bacon Stretcher Cover RGBColor 1 0 0 That reveals that the symbol Red is set to the result of the function RGBColor with parameters 1,0,0 RGBColor[1, 0, 0] Red Towards a New Data Modelling Architecture - Part 1.nb 9
  • 10. Function Representation We can represent this triplet, partInstanceAsList as a function with three arguments that take values from the Integer, String and Color domains partFunction[991, "Left Handed Bacon Stretcher Cover", Red] // TreeForm partFunction 991 Left Handed Bacon Stretcher Cover RGBColor 1 0 0 Association Representation Associations in Wolfram Language are very similar to the Association Type construct of the Topic Map data model. Each defined association is an instance of an association type, e.g. partIn- stanceAsAssociation, the type of a supplies part. The keys of the association, association role type according to Topic Maps terminology, describe the role type of the values in the association instance. The values of the association, association role players according to Topic Maps terminol- ogy, describe the particular instance of the association type. In the following example, we have three role players, the values 991, “Left Handed Bacon Stretcher Cover” and Red that describe a part from a supplies catalog database. Each player has a role type that is important for semantic purposes. The integer number 991, is used as an identifier for the part, its role type is that of an identifier. The string “Left Handed Bacon Stretcher Cover” is used as a name descriptor and the symbol Red is used as the value of the categorical variable color to describe the color of the part. The command to perform the association of attributes with their values is the following AssociationThread[attributes → partInstanceAsList] pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor →  Association is a relatively new fundamental construct in Wolfram Language, it acts like a symboli- cally indexed list. The main reason for using it is to allow highly efficient lookup and updating and also build complex hierarchical structures and other datasets. You can easily convert an Association to a list of rules % // Normal pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor →  partInstanceAsAssociation partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →  10 Towards a New Data Modelling Architecture - Part 1.nb
  • 11. Keys[partInstanceAsAssociation] {partID, partName, partColor} Values[partInstanceAsAssociation] 991, Left Handed Bacon Stretcher Cover,  HyperGraph Representation Although, it is possible to represent a hypergraph by a bipartite graph and follow the RDF (Subject- Predicate-Object) approach I would like to demonstrate a different perspective that eventually leads to what I named the enhanced hypergraph representation of the relational model that follows in Part 2 of this series. We will use two lists of rules, one for the attributes and another for the values. Essentially we follow the same principle as with the lists above, we separate the semantics from the data values, i.e. types from instances. This is a critical step in a completely new way of thinking about the data modeling process. List of Rules for Attributes Take the list of attributes first and add in the beginning of the list another one which we call nexus- Part. This is the hyperedge that connects all the other attributes together. Insert[attributes, "nexusPart", 1] {nexusPart, pid, pname, pcolor} and using a list of rules we take conceptsRules = Thread[Rule[ Table["nexusPart", {3}], attributes]] {nexusPart → pid, nexusPart → pname, nexusPart → pcolor} vstyle = {# → Black & /@ attributes, "nexusPart" → Red} // Flatten pid → , pname → , pcolor → , nexusPart →  A graphical representation of the list above is the following Graph[conceptsRules, VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name"] nexusPart pid pname pcolor Towards a New Data Modelling Architecture - Part 1.nb 11
  • 12. So, in this hypergraph the nexusPart plays the role of the hyperedge with a red color, that connects three hypernodes that represent the attributes (pid, pname, pcol) with a black color. List of Rules for Values Now we can proceed with the data partInstanceAsList 991, Left Handed Bacon Stretcher Cover,  dataRules = Thread[Rule[ Table["nexus991", {3}], partInstanceAsList]] nexus991 → 991, nexus991 → Left Handed Bacon Stretcher Cover, nexus991 →  vstyle = {# → Blue & /@ partInstanceAsList, "nexus991" → Green} // Flatten 991 → , Left Handed Bacon Stretcher Cover → , → , nexus991 →  A graphical representation of the list above is the following Graph[dataRules, VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name"] nexus991 991 Left Handed Bacon Stretcher Cover And in this hypergraph the nexus991 plays the role of a hyperedge with a green color, that connects three hypernodes that represent the values (991, “Left Handled....”, RED) with a blue color. Summary We created two handles, we call them nexuses, one at a layer of concepts to represent the head of the record and another at the data layer to represent the body of the record. Provided that we find a way to connect the two layers, we are now in a position to create concept graphs, we will call them maps from now on, that are similar to the ER diagrams that database designers build in RDBMS. Property Graph Representation In the property graph representation, each record becomes an instance of a class that is repre- sented graphically with a node. Attributes of the record are usually embedded in the structure of the node as properties of the class. If we follow our previous example we can take the association representation 12 Towards a New Data Modelling Architecture - Part 1.nb
  • 13. partInstanceAsAssociation partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →  and add an extra handler that refers to that association and represents a key for looking up associa- tions. For example: "recKey991" → partInstanceAsAssociation recKey991 → partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →  This way the same key can be used to maintain an association of biderectional links to other records through a specific edge. For example part 991 has two suppliers 1081 and 1082 "recKey991" → "recKey1082" -> "isSupplierOf991", "recKey1081" → "isSupplierOf991" recKey991 → recKey1082 → isSupplierOf991, recKey1081 → isSupplierOf991 Representation of a Table in Wolfram Language As a List of Lists Wolfram Language provides experessions that represent many SQL constructs such as Table (SQLTable, SQLTables, SQLTableNames, SQLTableInformation) and Column (SQLColumn, SQLColumns, SQLColumnNames, SQLColumnInformation). These commands get information about the structure of these constructs. There are also two styles of commands for working with data: Wolfram Language SQL commands, SQLSelect, SQLUpdate, SQLInsert, etc, for those who are familiar with the language and execution of SQL-Style query commands using SQLExecute statement. partsList = SQLSelect[conn, "Parts", "ShowColumnHeadings" → True] {{pid, pname, pcolor}, {991, Left Handed Bacon Stretcher Cover, Red}, {992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red}, {994, Acme Widget Washer, Silver}, {995, I Brake for Crop Circles Sticker, Translucent}, {996, Anti-Gravity Turbine Generator, Cyan}, {997, Anti-Gravity Turbine Generator, Magenta}, {998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}} partsList // TableForm pid pname pcolor 991 Left Handed Bacon Stretcher Cover Red 992 Smoke Shifter End Black 993 Acme Widget Washer Red 994 Acme Widget Washer Silver 995 I Brake for Crop Circles Sticker Translucent 996 Anti-Gravity Turbine Generator Cyan 997 Anti-Gravity Turbine Generator Magenta 998 Fire Hydrant Cap Red 999 7 Segment Display Green suppliersList = SQLSelect[conn, "Suppliers", "ShowColumnHeadings" → True]; Towards a New Data Modelling Architecture - Part 1.nb 13
  • 14. catalogList = SQLSelect[conn, "Catalog", "ShowColumnHeadings" → True]; As a List of Associations or Dataset Construct List of Associations Here we demonstrate how we can arrive to a list of associations from a list of lists of data in the previous section. First we split the header, column attributes, from the body, records of data. attributes = partsList[[1]] {pid, pname, pcolor} data = partsList[[2 ;;]] {{991, Left Handed Bacon Stretcher Cover, Red}, {992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red}, {994, Acme Widget Washer, Silver}, {995, I Brake for Crop Circles Sticker, Translucent}, {996, Anti-Gravity Turbine Generator, Cyan}, {997, Anti-Gravity Turbine Generator, Magenta}, {998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}} Now we are in a position to create the first association AssociationThread[attributes → data[[1]]] pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → Red Then generalize the method to transofrm the list of records to a list of associations AssociationThread[attributes -> #] & /@ data // TableForm pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → Red pid → 992, pname → Smoke Shifter End, pcolor → Black pid → 993, pname → Acme Widget Washer, pcolor → Red pid → 994, pname → Acme Widget Washer, pcolor → Silver pid → 995, pname → I Brake for Crop Circles Sticker, pcolor → Translucent pid → 996, pname → Anti-Gravity Turbine Generator, pcolor → Cyan pid → 997, pname → Anti-Gravity Turbine Generator, pcolor → Magenta pid → 998, pname → Fire Hydrant Cap, pcolor → Red pid → 999, pname → 7 Segment Display, pcolor → Green The Structured Dataset Construct of Wolfram Language Finally with the special Dataset construct that the Wolfram Language provides we can take a dataset. Dataset has the interesting property that can represent not only multidimensional arrays of data, but also data with arbitrary hierarchical structure. The second interesting property is that we can apply various operators such as part, filtering, aggregation, subquery and arbitrary functions directly on the dataset. A few examples to demonstrate these: 14 Towards a New Data Modelling Architecture - Part 1.nb
  • 15. partsDataset = Dataset[AssociationThread[attributes -> #] & /@ data] pid pname pcolor 991 Left Handed Bacon Stretcher Cover Red 992 Smoke Shifter End Black 993 Acme Widget Washer Red 994 Acme Widget Washer Silver 995 I Brake for Crop Circles Sticker Translucent 996 Anti-Gravity Turbine Generator Cyan 997 Anti-Gravity Turbine Generator Magenta 998 Fire Hydrant Cap Red 999 7 Segment Display Green 2 levels 27 elements suppliersDataset = Dataset[AssociationThread[suppliersList[[1]] → #] & /@ suppliersList[[2 ;;]]] sid sname saddress 1081 Acme Widget Suppliers 1 Grub St., Potemkin Village, IL 61801 1082 Big Red Tool and Die 4 My Way, Bermuda Shorts, OR 90305 1083 Perfunctory Parts 99999 Short Pier, Terra Del Fuego, TX 41299 1084 Alien Aircaft Inc. 2 Groom Lake, Rachel, NV 51902 2 levels 12 elements Towards a New Data Modelling Architecture - Part 1.nb 15
  • 16. catalogDataset = Dataset[AssociationThread[catalogList[[1]] → #] & /@ catalogList[[2 ;;]]] catsid catpid catcost 1081 991 36.1 1081 992 42.3 1081 993 15.3 1081 994 20.5 1081 995 20.5 1081 996 124.2 1081 997 124.2 1081 998 11.7 1081 999 75.2 1082 991 16.5 1082 997 0.55 1082 998 7.95 1083 998 12.5 1083 999 1. 1084 994 57.3 1084 995 22.2 ⋮ 1 2 levels 51 elements Select[catalogDataset, #catpid ⩵ 998 &]; Or equally catalogDataset[Select[#catpid ⩵ 998 &]] catsid catpid catcost 1081 998 11.7 1082 998 7.95 1083 998 12.5 1084 998 48.6 2 levels 12 elements catalogDataset[GroupBy[Key["catpid"]]]; Or equally 16 Towards a New Data Modelling Architecture - Part 1.nb
  • 17. catalogDataset[GroupBy[{#catpid} &]] {991} {catsid → 1081, catpid → 991, catcost → 36.1, catsid → 1082, catpid → 991, catcost → 16.5 {992} {catsid → 1081, catpid → 992, catcost → 42.3} {993} {catsid → 1081, catpid → 993, catcost → 15.3} {994} {catsid → 1081, catpid → 994, catcost → 20.5, catsid → 1084, catpid → 994, catcost → 57.3 {995} {catsid → 1081, catpid → 995, catcost → 20.5, catsid → 1084, catpid → 995, catcost → 22.2 {996} {catsid → 1081, catpid → 996, catcost → 124.2} {997} {catsid → 1081, catpid → 997, catcost → 124.2, catsid → 1082, catpid → 997, catcost → 0.55 {998} {catsid → 1081, catpid → 998, catcost → 11.7, catsid → 1082, catpid → 998, catcost → 7.95 {999} {catsid → 1081, catpid → 999, catcost → 75.2, catsid → 1083, catpid → 999, catcost → 1. 3 levels 51 elements Towards a New Data Modelling Architecture - Part 1.nb 17