This document introduces a new data modelling approach and compares it to traditional relational database models. It provides definitions and examples of key concepts in relational modelling like entities, attributes, relations, and constraints. It also demonstrates various ways to represent these constructs in Wolfram Language, including as lists, associations, graphs, and RDF triplets. The goal is to help software developers and data modellers learn the advantages of applying this new method.
Duplicate Detection in Hierarchical Data Using XPathiosrjce
There were many techniques for identifying duplicates in relational data, but only a few solutions
focus on identifying duplicates which has complex hierarchical structure, as XML data. In this paper, we
present a new technique for identifying XML duplicates, so-called XML duplication using Xpath. XML
duplication using Xpath technique uses a Bayesian network to conclude the possibility that two xml elements are
duplicates, based on the information within the elements and other information organized in the XML. In
addition, to increase the proficiency of the web usage, a new pruning strategy was created. This pruning
strategy will help to gain maximum benefits over non-computing algorithm. This technique can be used to
increase the proficiency of identifying duplicates and remove it, so no duplicate record will be there. Through
many experiments, our algorithm is able to achieve high accuracy and retrieve count in several XML dataset.
XML duplication using Xpath technique is able to outclass another technique for identifying duplicates, both in
proficiency and potency.
Duplicate Detection in Hierarchical Data Using XPathiosrjce
There were many techniques for identifying duplicates in relational data, but only a few solutions
focus on identifying duplicates which has complex hierarchical structure, as XML data. In this paper, we
present a new technique for identifying XML duplicates, so-called XML duplication using Xpath. XML
duplication using Xpath technique uses a Bayesian network to conclude the possibility that two xml elements are
duplicates, based on the information within the elements and other information organized in the XML. In
addition, to increase the proficiency of the web usage, a new pruning strategy was created. This pruning
strategy will help to gain maximum benefits over non-computing algorithm. This technique can be used to
increase the proficiency of identifying duplicates and remove it, so no duplicate record will be there. Through
many experiments, our algorithm is able to achieve high accuracy and retrieve count in several XML dataset.
XML duplication using Xpath technique is able to outclass another technique for identifying duplicates, both in
proficiency and potency.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
purpose of database systems, components of dbms, applications of
dbms, three tier dbms architecture, data independence, database schema, instance, data modeling,
entity relationship model, relational model
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Purpose of the data base system, data abstraction, data model, data independence, data definition
language, data manipulation language, data base manager, data base administrator, data base users,
overall structure.
ER Models, entities, mapping constrains, keys, E-R diagram, reduction E-R diagrams to tables,
generatio, aggregation, design of an E-R data base scheme.
Oracle RDBMS, architecture, kernel, system global area (SGA), data base writer, log writer, process
monitor, archiver, database files, control files, redo log files, oracle utilities.
SQL: commands and data types, data definition language commands, data manipulation commands,
data query language commands, transaction language control commands, data control language
commands.
Joins, equi-joins, non-equi-joins, self joins, other joins, aggregate functions, math functions, string
functions, group by clause, data function and concepts of null values, sub-querries, views.
PL/SQL, basics of pl/sql, data types, control structures, database access with PL/SQL, data base
connections, transaction management, data base locking, cursor management.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
purpose of database systems, components of dbms, applications of
dbms, three tier dbms architecture, data independence, database schema, instance, data modeling,
entity relationship model, relational model
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Purpose of the data base system, data abstraction, data model, data independence, data definition
language, data manipulation language, data base manager, data base administrator, data base users,
overall structure.
ER Models, entities, mapping constrains, keys, E-R diagram, reduction E-R diagrams to tables,
generatio, aggregation, design of an E-R data base scheme.
Oracle RDBMS, architecture, kernel, system global area (SGA), data base writer, log writer, process
monitor, archiver, database files, control files, redo log files, oracle utilities.
SQL: commands and data types, data definition language commands, data manipulation commands,
data query language commands, transaction language control commands, data control language
commands.
Joins, equi-joins, non-equi-joins, self joins, other joins, aggregate functions, math functions, string
functions, group by clause, data function and concepts of null values, sub-querries, views.
PL/SQL, basics of pl/sql, data types, control structures, database access with PL/SQL, data base
connections, transaction management, data base locking, cursor management.
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
Linked List Static and Dynamic Memory AllocationProf Ansari
Static variables are declared and named while writing the program. (Space for them exists as long as the program, in which they are declared, is running.) Static variables cannot be created or destroyed during execution of the program in which they are declared.
Dynamic variables are created (and may be destroyed) during program execution since dynamic variables do not exist while the program is compiled, but only when it is run, they cannot be assigned names while it is being written. The only way to access dynamic variables is by using pointers. Once it is created, however, a dynamic variable does contain data and must have a type like any other variable. If a dynamic variable is created in a function, then it can continue to exist even after the function terminates.
Linked Linear List
We saw in previous chapters how static representation of linear ordered list through Array leads to wastage of memory and in some cases overflows. Now we don't want to assign memory to any linear list in advance instead we want to allocate memory to elements as they are inserted in list. This requires Dynamic Allocation of memory and it can be achieved by using malloc() or calloc() function.
But memory assigned to elements will not be contiguous, which is a requirement for linear ordered list, and was provided by array representation. How we could achieve this?
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
Specific objective to discover some novel information from a set of documents initially retrieved in response to some query. Clustering sentences level text, effective use and update is still an open research issue, especially in domain of text mining. Since most existing system uses pattern belong to a single cluster. But here we can use patterns belongs to all cluster with different degree of membership. Since sentences of those documents we would expect at least one of the clusters to be closely related to the concepts described by the query term. This paper presents a Novel Fuzzy Clustering Algorithm that operates on relational input data (i.e. data in the form of square matrix of pair wise similarities between data objects).
CORCON2014: Does programming really need data structures?Marco Benini
This talk tries to suggest how computer programming can be conceptually simplified by using abstract mathematics, in particular categorical semantics, so to achieve the 'correctness by construction' paradigm paying no price in term of efficiency.
Also, it introduces an alternative point of view on what is a program and how to conceive data structures, namely as computable morphisms between models of a logical theory.
SIT221 Data Structures and Algorithms Trimester 2, 2019 .docxedgar6wallace88877
SIT221 Data Structures and Algorithms Trimester 2, 2019
1
Practical Task 5.1
(Pass Task)
Submission deadline: 10:00am Monday, August 26
Discussion deadline: 10:00am Saturday, September 14
General Instructions
The objective of this task is to study implementation of a Doubly Linked List, a generic data structure capable
to maintain an arbitrary number of data elements and support various standard operations to read, write,
and delete data. Compared to other popular data structures, linked list like data structures offer a number
of advantages with respect to time complexity and practical application. For example, where an array‐based
data structure, such as a simple list (or a vector), requires a contiguous memory location to store data, a
linked list may record new data elements anywhere in the memory. This is achievable by encapsulation of a
payload (the user’s data record) into a node, then connecting nodes into a sequence via memory references
(also known as links). Because of this, a linked list is not restricted in size and new nodes can be added
increasing the size of the list to any extent. Furthermore, it is allowed to use the first free and available
memory location with only a single overhead step of storing the address of memory location in the previous
node of a linked list. This makes insertion and removal operations in a linked list of a constant 1 time;
that is, as fast as possible. Remember that these operations generally run in a linear n time in an array
since memory locations are consecutive and fixed.
A doubly linked list outperforms a singly linked list achieving better runtime for deletion of a given data node
as it enables traversing the sequence of nodes in both directions, i.e. from starting to end and as well as from
end to starting. For a given a node, it is always possible to reach the previous node; this is what a singly linked
list does not permit. However, these benefits come at the cost of extra memory consumption since one
additional variable is required to implement a link to previous node. In the case of a simpler singly linked list,
just one link is used to refer to the next node. However, traversing is then possible in one direction only, from
the head of a linked list to its end.
1. To start, follow the link below and explore the functionality of the LinkedList<T> generic class available
within the Microsoft .NET Framework.
https://msdn.microsoft.com/en‐au/library/he2s3bh7(v=vs.110).aspx.
Because some operations that you are asked to develop in this task are similar to those in the
LinkedList<T>, you may refer to the existing description of the class to get more insights about how your
own code should work.
2. Explore the source code attached to this task. Create a new Microsoft Visual Studio project and import
the DoublyLinkedList.cs file. This file contains a template of the DoublyLinkedList<T> class. The objective
of the task i.
Data Mining Exploring DataLecture Notes for Chapter 3OllieShoresna
Data Mining: Exploring Data
Lecture Notes for Chapter 3
Introduction to Data Mining
by
Tan, Steinbach, Kumar
What is data exploration?Key motivations of data exploration includeHelping to select the right tool for preprocessing or analysisMaking use of humans’ abilities to recognize patterns People can recognize patterns not captured by data analysis tools
Related to the area of Exploratory Data Analysis (EDA)Created by statistician John TukeySeminal book is Exploratory Data Analysis by TukeyA nice online introduction can be found in Chapter 1 of the NIST Engineering Statistics Handbook
http://www.itl.nist.gov/div898/handbook/index.htm
A preliminary exploration of the data to better understand its characteristics.
Techniques Used In Data Exploration In EDA, as originally defined by TukeyThe focus was on visualizationClustering and anomaly detection were viewed as exploratory techniquesIn data mining, clustering and anomaly detection are major areas of interest, and not thought of as just exploratory
In our discussion of data exploration, we focus onSummary statisticsVisualizationOnline Analytical Processing (OLAP)
Iris Sample Data Set Many of the exploratory data techniques are illustrated with the Iris Plant data set.Can be obtained from the UCI Machine Learning Repository
http://www.ics.uci.edu/~mlearn/MLRepository.htmlFrom the statistician Douglas FisherThree flower types (classes): Setosa Virginica VersicolourFour (non-class) attributes Sepal width and length Petal width and length
Virginica. Robert H. Mohlenbrock. USDA NRCS. 1995. Northeast wetland flora: Field office guide to plant species. Northeast National Technical Center, Chester, PA. Courtesy of USDA NRCS Wetland Science Institute.
Summary StatisticsSummary statistics are numbers that summarize properties of the data
Summarized properties include frequency, location and spread Examples: location - mean
spread - standard deviation
Most summary statistics can be calculated in a single pass through the data
Frequency and ModeThe frequency of an attribute value is the percentage of time the value occurs in the
data set For example, given the attribute ‘gender’ and a representative population of people, the gender ‘female’ occurs about 50% of the time.The mode of a an attribute is the most frequent attribute value The notions of frequency and mode are typically used with categorical data
PercentilesFor continuous data, the notion of a percentile is more useful.
Given an ordinal or continuous attribute x and a number p between 0 and 100, the pth percentile is a value of x such that p% of the observed values of x are less than .
For instance, the 50th percentile is the value such that 50% of all values of x are less than .
Measures of Location: Mean and MedianThe mean is the most common measure of the location of a set of points. However, the mean is very sensitive to outliers. ...
User_42751212015Module1and2pagestocompetework.pdf
User_42751212015Module1and2pagestocompetework_1.pdf
User_42751212015Module2Homework(CIS330).docx
[INSERT TITLE HERE] 1
Running head: [INSERT TITLE HERE]
[INSERT TITLE HERE]
Student Name
Allied American University
Author Note
This paper was prepared for [INSERT COURSE NAME], [INSERT COURSE ASSIGNMENT] taught by [INSERT INSTRUCTOR’S NAME].
Directions: Please complete each of the following exercises. Please read the instructions carefully.
For all “short programming assignments,” include source code files in your submission.
1. Short programming assignment. Combine the malloc2D function of program 3.16 with the adjacency matrix code of program 3.18 to write a program that allows the user to first enter the count of vertices, and then enter the graph edges. The program should then output the graph with lines of the form:
There is an edge between 0 and 3.
2. Short programming assignment. Modify your program for question 2.1 so that after the adjacency matrix is created, it is then converted to an adjacency list, and the output is generated from the list.
3. Short programming assignment. Modify program 4.7 from the text, overloading the == operator to work for this ADT using a friend function.
4. Is the ADT given in program 4.7 a first-class ADT? Explain your answer.
5. Suppose you are given the source code for a C++ class, and asked if the class shown is an ADT. On what factors would your decision be based?
6. How does using strings instead of simple types like integers alter the O-notation of operations?
User_42751212015Module1Homework(CIS330)Corrected (1).docx
[INSERT TITLE HERE] 1
Running head: [INSERT TITLE HERE]
[INSERT TITLE HERE]
Student Name
Allied American University
Author Note
This paper was prepared for [INSERT COURSE NAME], [INSERT COURSE ASSIGNMENT] taught by [INSERT INSTRUCTOR’S NAME].
Directions: Please refer to your textbook to complete the following exercises.1. Refer to page 12 of your text to respond to the following:Show the contents of the id array after each union operation when you use the quick find algorithm (Program I.I) to solve the connectivity problem for the sequence 0-2, 1-4, 2-5, 3-6, 0-4, 6-0, and 1-3. Also give the number of times the program accesses the id array for each input pair.2. Refer to page 12 of your text to respond to the following:Show the contents of the id array after each union operation when you use the quick union algorithm (Program I.I) to solve the connectivity problem for the sequence 0-2, 1-4, 2-5, 3-6, 0-4, 6-0, and 1-3. Also give the number of times the program accesses the id array for each input pair.3. Refer to figures 1.7 and 1.8 on pages 16 and 17 of the text. Give the contents of the id array after each union operation for the weighted quick union algorithm running on the examples corresponding to figures 1.7 and 1.84. For what value is N is 10N lg N>2N2 ...
Similar to Towards a New Data Modelling Architecture - Part 1 (20)
Towards a New Data Modelling Architecture - Part 1
1. Towards a New Data Modelling
Architecture
By Athanassios I. Hatzis, PhD, R&D Software Engineer
(C) 17th of March 2015
This is a series of Wolfram Language notebooks that introduce progressively the art of a new
innovative, exhilarating, data modelling approach to software developers, architects, data model
designers and everyone interested in learning the advantages of applying this method and the main
differences from the data models of the past. In Part I We start with terms and constructs that most
of us are familiar with from the relational database management systems and we continue with the
terminology and constructs of the new model in Part 2.
Part1: Representations and Transformations
on the Constructs of the Relational Model in
Wolfram Language
Scope
The entity-relational data model (ERDM) is still the most popular data model in database manage-
ment systems. There are several reason for that, but the main one is the simple and natural way of
managing data in tables with rows (records) and columns (attributes). On top of that SQL is a very
powerful and easy to learn programming language that covers completely the relational operators
on data sets. In this Mathematica notebook various methods of representing the basic constructs of
the relational model are demonstrated.
◆ Definitions
Product Type
ClearAll["Global`*"]
Needs["DatabaseLink`"]
2. conn = OpenSQLConnection[
JDBC["Microsoft Access(ODBC)", "C:TempSuppliesCatalogDB.mdb"]];
Wikipedia Extract
“In programming languages and type theory, a product of types is another, compounded, type in a
structure. The “operands” of the product are types, and the structure of a product type is determined
by the fixed order of the operands in the product. An instance of a product type retains the fixed
order, but otherwise may contain all possible instances of its primitive data types. The expression of
an instance of a product type will be a tuple, and is called a “tuple type” of expression. A product of
types is a direct product of two or more types.”
Example
Integer x String x Colour. In Wolfram Language an instance, p1, of such a type is represented with
a list abstract data type. In Wolfram Language we can write
partInstanceAsList = {991, "Left Handed Bacon Stretcher Cover", Red}
991, Left Handed Bacon Stretcher Cover,
And to check/verify the type for each element of the list we apply the function Head to p1
Head /@ partInstanceAsList
{Integer, String, RGBColor}
Tuple or record or row
Wikipedia Extract
“A tuple is an ordered list of elements. In mathematics, an n-tuple is a sequence (or ordered list) of
n elements, where n is a non-negative integer. In computer science, tuples are directly implemented
as product types in most functional programming languages. More commonly, they are imple-
mented as record types, where the components are labeled instead of being identified by position
alone. This approach is also used in relational algebra.”
“In database theory, the relational model uses a tuple definition similar to tuples as functions, but
each tuple element is identified by a distinct name, called an attribute, instead of a number; this
leads to a more user-friendly and practical notation. A tuple in the relational model is formally
defined as a finite function that maps attributes to values. In this notation, attribute–value pairs may
appear in any order.”
Example
( partID : 991, partName : “Left Handed Bacon Stretcher Cover”, partColor : Red )
In Wolfram Language record abstract data structure is usually represented with the Assocation
function, i.e. a symbolically indexed list of rules (key->value pairs).
2 Towards a New Data Modelling Architecture - Part 1.nb
3. partInstanceAsAssociation = Association[partID → 991,
partName -> "Left Handed Bacon Stretcher Cover", partColor → Red]
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →
And we can take a list of values in the following way
Values[partInstanceAsAssociation]
991, Left Handed Bacon Stretcher Cover,
or take a list of rules (key->value) pairs
partInstanceAsAssociation // Normal
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →
Attribute or field or column
Wikipedia Extract
“The basic relational building block is the domain or data type, usually abbreviated nowadays to
type. A tuple is an ordered set of attribute values. An attribute is an ordered pair of attribute name
and type name. An attribute value is a specific valid value for the type of the attribute. This can be
either a scalar value or a more complex type. A domain describes the set of possible values for a
given attribute, and can be considered a constraint on the value of the attribute. Mathematically,
attaching a domain to an attribute means that any value for the attribute must be an element of the
specified set. Constraints make it possible to further restrict the domain of an attribute.”
Example
In our example, two of our attributes partID, integer data type and partName, string data type take
scalar values. The partColor attribute is of complex type, RGBColor.
Apply[Rule,
Thread[{Keys[partInstanceAsAssociation], Head /@ partInstanceAsList}], {1}]
{partID → Integer, partName → String, partColor → RGBColor}
Important Notice
Attribute can be seen as a mapping function. It maps a tuple to a value. For example, what is the
color of the part. We can define a function where we pass a single argument which is the associa-
tion representation of the tuple and we return the specific value of the key.
Turn the keys of the association to functions, return the specific key of association instance as the
value of the attribute
isIdentifierOf[assoc_] := assoc[partID]
isNameOf[assoc_] := assoc[partName]
isColorOf[assoc_] := assoc[partColor]
Towards a New Data Modelling Architecture - Part 1.nb 3
4. {isIdentifierOf[partInstanceAsAssociation],
isNameOf[partInstanceAsAssociation], isColorOf[partInstanceAsAssociation]}
991, Left Handed Bacon Stretcher Cover,
Relation (Base relval)
Wikipedia Extract
“In the relational model, a relation is a (possibly empty) finite set of tuples all having the same finite
set of attributes.This set of attributes is more formally called the sort of the relation, or more casually
referred to as the set of column names. A tuple is usually implemented as a row in a database
table. The fundamental assumption of the relational model is that all data is represented as mathe-
matical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In
the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning
there are two possible evaluations for each proposition: either true or false (and in particular no third
value such as unknown, or not applicable, either of which are often associated with the concept of
NULL). Data are operated upon by means of a relational calculus or relational algebra, these being
equivalent in expressive power.
A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a
set is an unordered collection of unique, non-duplicated items. A table is an accepted visual
representation of a relation; a tuple is similar to the concept of a row. It is a set of tuples sharing the
same attributes; a set of columns and rows. A relvar is a named variable of some specific relation
type, to which at all times some relation of that type is assigned, though the relation may contain
zero tuples.”
Predicates and the Closed World Assumption
“A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary
relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples.
The body of a relation is sometimes called its extension. This is because it is to be interpreted as a
representation of the extension of some predicate, this being the set of true propositions that can be
formed by replacing each free variable in that predicate by a name (a term that designates some-
thing). There is a one-to-one correspondence between the free variables of the predicate and the
attribute names of the relation heading. Each tuple of the relation body provides attribute values to
instantiate the predicate by substituting each of its free variables. The result is a proposition that is
deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise,
every tuple whose heading conforms to that of the relation, but which does not appear in the body is
deemed to be false. This assumption is known as the closed world assumption: it is often violated in
practical databases, where the absence of a tuple might mean that the truth of the corresponding
proposition is unknown.”
4 Towards a New Data Modelling Architecture - Part 1.nb
5. Example
SQLSelect[conn, "Parts", "ShowColumnHeadings" → True] // TableForm
pid pname pcolor
991 Left Handed Bacon Stretcher Cover Red
992 Smoke Shifter End Black
993 Acme Widget Washer Red
994 Acme Widget Washer Silver
995 I Brake for Crop Circles Sticker Translucent
996 Anti-Gravity Turbine Generator Cyan
997 Anti-Gravity Turbine Generator Magenta
998 Fire Hydrant Cap Red
999 7 Segment Display Green
SQLSelect[conn, "Parts", "ShowColumnHeadings" → True]
{{pid, pname, pcolor}, {991, Left Handed Bacon Stretcher Cover, Red},
{992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red},
{994, Acme Widget Washer, Silver},
{995, I Brake for Crop Circles Sticker, Translucent},
{996, Anti-Gravity Turbine Generator, Cyan},
{997, Anti-Gravity Turbine Generator, Magenta},
{998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}}
View or Result Set (Derived relvar)
Wikipedia Extract
“In a relational database, all data are stored and accessed via relations. Relations that store data
are called “base relations”, and in implementations are called “tables”. Other relations do not store
data, but are computed by applying relational operations to other relations. These relations are
sometimes called “derived relations”. In implementations these are called “views” or “queries”.”
Example
queryString = "
SELECT Catalog.catsid, Suppliers.sname,
Catalog.catpid, Parts.pname, Parts.pcolor, Catalog.catcost
FROM Suppliers INNER JOIN (Parts INNER JOIN [Catalog] ON Parts.pid
= Catalog.[catpid]) ON Suppliers.sid = Catalog.[catsid]
WHERE (((Catalog.catpid)=998))
ORDER BY Catalog.catcost;";
SQLExecute[conn, queryString, "ShowColumnHeadings" → True] // TableForm
catsid sname catpid pname pcolor catcost
1082 Big Red Tool and Die 998 Fire Hydrant Cap Red 7.95
1081 Acme Widget Suppliers 998 Fire Hydrant Cap Red 11.7
1083 Perfunctory Parts 998 Fire Hydrant Cap Red 12.5
1084 Alien Aircaft Inc. 998 Fire Hydrant Cap Red 48.6
Towards a New Data Modelling Architecture - Part 1.nb 5
6. Database
Wikipedia Extract
“Each database is a collection of related tables; these are also called relations, hence the name
“relational database”. Each table is a physical representation of an entity or object that is in a tabular
format consisting of columns and rows.”
Example
SQLTableNames[conn]
{Catalog, Parts, Suppliers}
SQLTableInformation[conn, "ShowColumnHeadings" → True] // TableForm
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
C:TempSuppliesCatalogDB.mdb Null Catalog TABLE Null
C:TempSuppliesCatalogDB.mdb Null Parts TABLE Null
C:TempSuppliesCatalogDB.mdb Null Suppliers TABLE Null
SQLTableNames[conn, "TableType" → SQLTableTypeNames[conn]]
{MSysAccessObjects, MSysAccessXML, MSysACEs, MSysIMEXColumns,
MSysIMEXSpecs, MSysNameMap, MSysNavPaneGroupCategories, MSysNavPaneGroups,
MSysNavPaneGroupToObjects, MSysNavPaneObjectIDs, MSysObjects, MSysQueries,
MSysRelationships, Catalog, Parts, Suppliers, View998Suppliers, ViewAll}
Entity-Relationship (ER) Modeling
Wikipedia Extract
“An entity–relationship model is a systematic way of describing and defining a business process.
The process is modeled as components (entities) that are linked with each other by relationships
that express the dependencies and requirements between them. Entities may have various proper-
ties (attributes) that characterize them. Diagrams created to represent these entities, attributes, and
relationships graphically are called entity–relationship diagrams.”
ER / Relational Terms Equivalence
Entity Type - Relation (Table, Base relvar)
Entity - Tuple
Attribute - Attribute (column)
Relationship - View (Result set or Derived relvar)
EER (Enhanced Entity-Relationship) Model
“The EER model includes all of the concepts introduced by the ER model. Additionally it includes
the concepts of a subclass and superclass (Is-a), along with the concepts of specialization and
generalization. Furthermore, it introduces the concept of a union type or category, which is used to
6 Towards a New Data Modelling Architecture - Part 1.nb
7. represent a collection of objects that is the union of objects of different entity types.”
Constrains
Wikipedia Extract
“Constraints provide one method of implementing business rules in the database. SQL implements
constraint functionality in the form of check constraints. Constraints restrict the data that can be
stored in relations. These are usually defined using expressions that result in a boolean value,
indicating whether or not the data satisfies the constraint.
Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an
entire relation. Since every attribute has an associated domain, there are constraints (domain
constraints). The two principal rules for the relational model are known as entity integrity and referen-
tial integrity.
In ER modeling we have two types of constrains that are placed on relationships, cardinality and
participation. A cardinality constraint places a limit on the number of relationships an entity may
participate in at any given time.”
◆ Table and Record in Wolfram Language
The basic construct of the ER model is the record, a table can be defined as a list of records.
Representation of a Record in Wolfram Language
List Representations
You need to maintain two ordered lists, one for the data values and another one for the semantics,
i.e. the attribute/column names.
partInstanceAsList
991, Left Handed Bacon Stretcher Cover,
attributes = {"pid", "pname", "pcolor"}
{pid, pname, pcolor}
But you can combine the two lists in one list of rules with the following command
Thread[Rule[attributes, partInstanceAsList]]
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor →
A Rule is the equivalent of a key-value pair, but it is more powerful because in Wolfram Language it
is the basic mechanism that is used in transformations. Nevertheless for lookup operations and
updating Wolfram researchers added a more powerful construct that is called Association, see
below.
Towards a New Data Modelling Architecture - Part 1.nb 7
8. Graph Representation with triplets (RDF)
Let us call a specific part instance partXYZ, if we represent this as the subject resource of a triplet,
the list of attributes as the predicates and the list of values as the objects we can take the following
triplets adding example namespaces
subject = Table["http://example.org/resource/partXYZ", {3}];
predicate = StringJoin["http://example.org/attribute/", #] & /@ attributes;
object = partInstanceAsList;
Transpose[{subject, predicate, object}] // TableForm
http://example.org/resource/partXYZ http://example.org/attribute/pid 991
http://example.org/resource/partXYZ http://example.org/attribute/pname Left
http://example.org/resource/partXYZ http://example.org/attribute/pcolor
Directed Graph
graphData = DirectedEdge[#, partXYZ] & /@ Values[partInstanceAsAssociation]
991 partXYZ, Left Handed Bacon Stretcher Cover partXYZ, partXYZ
vstyle =
{# → Black & /@ Values[partInstanceAsAssociation], partXYZ → Red} // Flatten
991 → , Left Handed Bacon Stretcher Cover → , → , partXYZ →
elabels = Apply[Rule, Thread[{graphData, Keys[partInstanceAsAssociation]}], {1}]
991 partXYZ → partID,
Left Handed Bacon Stretcher Cover partXYZ → partName, partXYZ → partColor
8 Towards a New Data Modelling Architecture - Part 1.nb
9. Graph[graphData,
VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name",
EdgeStyle → Thick, EdgeLabels -> elabels,
EdgeShapeFunction → GraphElementData[{"CarvedArrow", "ArrowSize" → .06}],
GraphLayout → "SpringEmbedding"]
partID
partName
partColor
991
partXYZ
Left Handed Bacon Stretcher Cover
We can observe two types of nodes in this kind of graph, URI (RED) node and Literal (BLACK)
nodes.
TreeForm Representation
From the discussion above we saw two abstract data structures that are suitable for tuple representa-
tion: the List and the Associative array also known as Map or Dictionary. The equivalent representa-
tions in Wolfram Language are the List and the Association functions. In fact, Wolfram Language
functions are tree data structures that are created in the memory as a contiguous array of pointers,
the first to the head and the rest to its successive elements. Take for example the list we defined,
partInstanceAsList, we can present it in a tree form:
partInstanceAsList // TreeForm
List
991 Left Handed Bacon Stretcher Cover RGBColor
1 0 0
That reveals that the symbol Red is set to the result of the function RGBColor with parameters 1,0,0
RGBColor[1, 0, 0]
Red
Towards a New Data Modelling Architecture - Part 1.nb 9
10. Function Representation
We can represent this triplet, partInstanceAsList as a function with three arguments that take
values from the Integer, String and Color domains
partFunction[991, "Left Handed Bacon Stretcher Cover", Red] // TreeForm
partFunction
991 Left Handed Bacon Stretcher Cover RGBColor
1 0 0
Association Representation
Associations in Wolfram Language are very similar to the Association Type construct of the Topic
Map data model. Each defined association is an instance of an association type, e.g. partIn-
stanceAsAssociation, the type of a supplies part. The keys of the association, association role
type according to Topic Maps terminology, describe the role type of the values in the association
instance. The values of the association, association role players according to Topic Maps terminol-
ogy, describe the particular instance of the association type. In the following example, we have
three role players, the values 991, “Left Handed Bacon Stretcher Cover” and Red that describe a
part from a supplies catalog database. Each player has a role type that is important for semantic
purposes. The integer number 991, is used as an identifier for the part, its role type is that of an
identifier. The string “Left Handed Bacon Stretcher Cover” is used as a name descriptor and the
symbol Red is used as the value of the categorical variable color to describe the color of the part.
The command to perform the association of attributes with their values is the following
AssociationThread[attributes → partInstanceAsList]
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor →
Association is a relatively new fundamental construct in Wolfram Language, it acts like a symboli-
cally indexed list. The main reason for using it is to allow highly efficient lookup and updating and
also build complex hierarchical structures and other datasets.
You can easily convert an Association to a list of rules
% // Normal
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor →
partInstanceAsAssociation
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →
10 Towards a New Data Modelling Architecture - Part 1.nb
11. Keys[partInstanceAsAssociation]
{partID, partName, partColor}
Values[partInstanceAsAssociation]
991, Left Handed Bacon Stretcher Cover,
HyperGraph Representation
Although, it is possible to represent a hypergraph by a bipartite graph and follow the RDF (Subject-
Predicate-Object) approach I would like to demonstrate a different perspective that eventually leads
to what I named the enhanced hypergraph representation of the relational model that follows in
Part 2 of this series.
We will use two lists of rules, one for the attributes and another for the values. Essentially we follow
the same principle as with the lists above, we separate the semantics from the data values, i.e.
types from instances. This is a critical step in a completely new way of thinking about the data
modeling process.
List of Rules for Attributes
Take the list of attributes first and add in the beginning of the list another one which we call nexus-
Part. This is the hyperedge that connects all the other attributes together.
Insert[attributes, "nexusPart", 1]
{nexusPart, pid, pname, pcolor}
and using a list of rules we take
conceptsRules = Thread[Rule[
Table["nexusPart", {3}],
attributes]]
{nexusPart → pid, nexusPart → pname, nexusPart → pcolor}
vstyle = {# → Black & /@ attributes, "nexusPart" → Red} // Flatten
pid → , pname → , pcolor → , nexusPart →
A graphical representation of the list above is the following
Graph[conceptsRules, VertexSize → Medium,
VertexStyle → vstyle, VertexLabels → "Name"]
nexusPart
pid pname pcolor
Towards a New Data Modelling Architecture - Part 1.nb 11
12. So, in this hypergraph the nexusPart plays the role of the hyperedge with a red color, that connects
three hypernodes that represent the attributes (pid, pname, pcol) with a black color.
List of Rules for Values
Now we can proceed with the data
partInstanceAsList
991, Left Handed Bacon Stretcher Cover,
dataRules = Thread[Rule[
Table["nexus991", {3}],
partInstanceAsList]]
nexus991 → 991, nexus991 → Left Handed Bacon Stretcher Cover, nexus991 →
vstyle = {# → Blue & /@ partInstanceAsList, "nexus991" → Green} // Flatten
991 → , Left Handed Bacon Stretcher Cover → , → , nexus991 →
A graphical representation of the list above is the following
Graph[dataRules, VertexSize → Medium, VertexStyle → vstyle, VertexLabels → "Name"]
nexus991
991 Left Handed Bacon Stretcher Cover
And in this hypergraph the nexus991 plays the role of a hyperedge with a green color, that connects
three hypernodes that represent the values (991, “Left Handled....”, RED) with a blue color.
Summary
We created two handles, we call them nexuses, one at a layer of concepts to represent the head of
the record and another at the data layer to represent the body of the record. Provided that we find a
way to connect the two layers, we are now in a position to create concept graphs, we will call them
maps from now on, that are similar to the ER diagrams that database designers build in RDBMS.
Property Graph Representation
In the property graph representation, each record becomes an instance of a class that is repre-
sented graphically with a node. Attributes of the record are usually embedded in the structure of the
node as properties of the class. If we follow our previous example we can take the association
representation
12 Towards a New Data Modelling Architecture - Part 1.nb
13. partInstanceAsAssociation
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →
and add an extra handler that refers to that association and represents a key for looking up associa-
tions. For example:
"recKey991" → partInstanceAsAssociation
recKey991 →
partID → 991, partName → Left Handed Bacon Stretcher Cover, partColor →
This way the same key can be used to maintain an association of biderectional links to other
records through a specific edge. For example part 991 has two suppliers 1081 and 1082
"recKey991" →
"recKey1082" -> "isSupplierOf991", "recKey1081" → "isSupplierOf991"
recKey991 → recKey1082 → isSupplierOf991, recKey1081 → isSupplierOf991
Representation of a Table in Wolfram Language
As a List of Lists
Wolfram Language provides experessions that represent many SQL constructs such as Table
(SQLTable, SQLTables, SQLTableNames, SQLTableInformation) and Column (SQLColumn,
SQLColumns, SQLColumnNames, SQLColumnInformation). These commands get information
about the structure of these constructs.
There are also two styles of commands for working with data: Wolfram Language SQL commands,
SQLSelect, SQLUpdate, SQLInsert, etc, for those who are familiar with the language and execution
of SQL-Style query commands using SQLExecute statement.
partsList = SQLSelect[conn, "Parts", "ShowColumnHeadings" → True]
{{pid, pname, pcolor}, {991, Left Handed Bacon Stretcher Cover, Red},
{992, Smoke Shifter End, Black}, {993, Acme Widget Washer, Red},
{994, Acme Widget Washer, Silver},
{995, I Brake for Crop Circles Sticker, Translucent},
{996, Anti-Gravity Turbine Generator, Cyan},
{997, Anti-Gravity Turbine Generator, Magenta},
{998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}}
partsList // TableForm
pid pname pcolor
991 Left Handed Bacon Stretcher Cover Red
992 Smoke Shifter End Black
993 Acme Widget Washer Red
994 Acme Widget Washer Silver
995 I Brake for Crop Circles Sticker Translucent
996 Anti-Gravity Turbine Generator Cyan
997 Anti-Gravity Turbine Generator Magenta
998 Fire Hydrant Cap Red
999 7 Segment Display Green
suppliersList = SQLSelect[conn, "Suppliers", "ShowColumnHeadings" → True];
Towards a New Data Modelling Architecture - Part 1.nb 13
14. catalogList = SQLSelect[conn, "Catalog", "ShowColumnHeadings" → True];
As a List of Associations or Dataset Construct
List of Associations
Here we demonstrate how we can arrive to a list of associations from a list of lists of data in the
previous section. First we split the header, column attributes, from the body, records of data.
attributes = partsList[[1]]
{pid, pname, pcolor}
data = partsList[[2 ;;]]
{{991, Left Handed Bacon Stretcher Cover, Red}, {992, Smoke Shifter End, Black},
{993, Acme Widget Washer, Red}, {994, Acme Widget Washer, Silver},
{995, I Brake for Crop Circles Sticker, Translucent},
{996, Anti-Gravity Turbine Generator, Cyan},
{997, Anti-Gravity Turbine Generator, Magenta},
{998, Fire Hydrant Cap, Red}, {999, 7 Segment Display, Green}}
Now we are in a position to create the first association
AssociationThread[attributes → data[[1]]]
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → Red
Then generalize the method to transofrm the list of records to a list of associations
AssociationThread[attributes -> #] & /@ data // TableForm
pid → 991, pname → Left Handed Bacon Stretcher Cover, pcolor → Red
pid → 992, pname → Smoke Shifter End, pcolor → Black
pid → 993, pname → Acme Widget Washer, pcolor → Red
pid → 994, pname → Acme Widget Washer, pcolor → Silver
pid → 995, pname → I Brake for Crop Circles Sticker, pcolor → Translucent
pid → 996, pname → Anti-Gravity Turbine Generator, pcolor → Cyan
pid → 997, pname → Anti-Gravity Turbine Generator, pcolor → Magenta
pid → 998, pname → Fire Hydrant Cap, pcolor → Red
pid → 999, pname → 7 Segment Display, pcolor → Green
The Structured Dataset Construct of Wolfram Language
Finally with the special Dataset construct that the Wolfram Language provides we can take a
dataset. Dataset has the interesting property that can represent not only multidimensional arrays of
data, but also data with arbitrary hierarchical structure. The second interesting property is that we
can apply various operators such as part, filtering, aggregation, subquery and arbitrary functions
directly on the dataset. A few examples to demonstrate these:
14 Towards a New Data Modelling Architecture - Part 1.nb
15. partsDataset = Dataset[AssociationThread[attributes -> #] & /@ data]
pid pname pcolor
991 Left Handed Bacon Stretcher Cover Red
992 Smoke Shifter End Black
993 Acme Widget Washer Red
994 Acme Widget Washer Silver
995 I Brake for Crop Circles Sticker Translucent
996 Anti-Gravity Turbine Generator Cyan
997 Anti-Gravity Turbine Generator Magenta
998 Fire Hydrant Cap Red
999 7 Segment Display Green
2 levels 27 elements
suppliersDataset =
Dataset[AssociationThread[suppliersList[[1]] → #] & /@ suppliersList[[2 ;;]]]
sid sname saddress
1081 Acme Widget Suppliers 1 Grub St., Potemkin Village, IL 61801
1082 Big Red Tool and Die 4 My Way, Bermuda Shorts, OR 90305
1083 Perfunctory Parts 99999 Short Pier, Terra Del Fuego, TX 41299
1084 Alien Aircaft Inc. 2 Groom Lake, Rachel, NV 51902
2 levels 12 elements
Towards a New Data Modelling Architecture - Part 1.nb 15