1. Towards a New Data Modelling
Architecture
By Athanassios I. Hatzis, PhD, R&D Software Engineer
(C) 17th of March 2015
In Part 1 of this series we talked about the main fundamental constructs of the Entity-Relationship
data model, i.e. the record and the table. We continue our discussion here introducing the Atomic
Information Resource (AIR) data model of the AtomicDB database management system and we
describe the principles of R3DM, a conceptual framework based on semiosis that can offer a solid
theoretical background and basis for this approach in Part 3.
Part2: The Atomic Information Resource
Data Model - AIR(DM)
◆ Background Information
Origins of the AIR Data Model
The data model we describe has a long history behind it. Searching at Google for patents with the
title “Data Base and Knowledge Operating System” or with the title “Data management architecture
associating generic data items using reference” we find several documents that are filed in the year
2003 from the inventor Ron Everett. The same year a “proof of concept” for this invention was
successfully demonstrated with a use case of managing a 50+ milion records of spare/repair part
requirements for US Navy ships. Today AtomicDB is a fully fledged database management system
that is based on Ron Everett’s patented associative data items architecture. Several pilots are
currently running under big corporations and University research establishments.
AtomicDB API Evaluation
The author of this article is an evaluator of AtomicDB and has been given full access to test GUIs,
APIs and web services of their system. In particular he is exploring their C# API functionality for
developers. In order to make tests interactive and efficient most functions of their API have been
ported to Wolfram Language using the NETLink C # interconnectivity package. In this part
function calls and other expressions that you see are based on this AtomicDB Mathematica API that
is not included with the publication of this article.
2. ◆ Introduction to the Problem
The Table ‘Silo’ Structure
Needs["DatabaseLink`"]
conn = OpenSQLConnection[
JDBC["Microsoft Access(ODBC)", "C:TempSuppliesCatalogDB.mdb"]];
Extracts from the Inventor of Technology Ron Everett
The following extracts are from a recent article titled “Introduction to Associative Information Sys-
tems”
“When we attempt to use Tables as a storage paradigm for Information we discover that Tables are
a namespace bound, non-dynamic, 2-D, structured storage paradigm that has a different structure
for every Table in every Database. Each application is developed with unique and special queries
written to each specific database design, table layout and named tables, columns and keys.”
This is the main reason that noSQL databases appeared on the scene recently, key-value store,
hierarchical semi-structured documents, column based and graph-property data structures are all
attempts to provide a solution to this problem.
Table Example
This is the Parts table from a Microsoft Access relational database that we have used in Part 1
(partsList = SQLExecute[conn, "select pid,pname from Parts where pid<994",
"ShowColumnHeadings" → True]) // TableForm
pid pname
991 Left Handed Bacon Stretcher Cover
992 Smoke Shifter End
993 Acme Widget Washer
"Some have hailed XML (RDF and triple stores) as the means to solve the n-dimensional relation-
ship problem, because with it, meta-information can be captured, but XML is plagued with other
problems, not the least of which are namespace binding requiring semantic accord, massively
replicated tags and data, the heavy overhead of text based processing, the necessity of searching
and indexing all the text in every possible XML document for each and every key/ value-tag/data
match sought and the distribution of the tagged datasets across innumerable XML documents,
stored in 2-D table-referenced 2-D file structures. Add to that list the overhead imposed by using
Semantic Web languages and ontologies and the PhD level specialists required to develop and
maintain these 'knowledge' oriented systems and you get even more namespace entrenchment and
hence specialization of the applications developed with it all."
Table Example in XML format
This is the same table with records as above but now it is serialized in XML
2 Towards a New Data Modelling Architecture - Part 2.nb
3. partsXML = "<?xml version=" 1.0 " encoding=" UTF - 8 " ?>
<dataroot>
<Parts>
<pid>991</pid>
<pname>Left Handed Bacon Stretcher Cover</pname>
<pcolor>Red</pcolor>
</Parts>
<Parts>
<pid>992</pid>
<pname>Smoke Shifter End</pname>
<pcolor>Black</pcolor>
</Parts>
<Parts>
<pid>993</pid>
<pname>Acme Widget Washer</pname>
<pcolor>Red</pcolor>
</Parts>
</dataroot>";
Table Example in JSON format
Same records of the table serialized in JSON format
partsJSON = "{
"dataroot": {
"Parts": [
{
"pid": "991",
"pname": "Left Handed Bacon Stretcher Cover",
"pcolor": "Red"
},
{
"pid": "992",
"pname": "Smoke Shifter End",
"pcolor": "Black"
},
{
"pid": "993",
"pname": "Acme Widget Washer",
"pcolor": "Red"
}
]
}
}";
The Alternative Solution - From Data Items in Table to Information Atoms with
NO Table
“Every table is a silo. Every cell is an atom of data with no awareness of its contexts, or how it fits
in to anything beyond its cell.It can be located by external intelligence but on its own it' s a "dumb"
participant in the system - the ultimate disconnected micro - fragment accessible only by knowing
the column and the record it exists in. The alternative is to replace the data elements with
Towards a New Data Modelling Architecture - Part 2.nb 3
4. information at the atomic level of the system. Instead of a data atom in a table, we have an
information atom with no table. Information atoms exist in a multi-D vector space unbounded by
data structures and know their context, such as a “customer” or a “product”, just like atoms in the
physical world “know” they are nitrogen or hydrogen items and behave accordingly. Information
atoms also know when they were created, when they were last modified, and what other information
atoms of other types are associated with them. They know their parents, their siblings, and their
workplace associates. They are powerful little entities and most certainly NOT fragments. Nor are
they triple statements requiring endless extraneous indexing.”
Therefore at its core AtomicDB is datatype and namespace agnostic, always fully contextual-
ized, and structure free.
All these sound perfect in theory but in practice everyone is accustomed to the use of tables. The
table is already the favourite manageable structure and it is most convenient as a medium of
exchanging datasets. Therefore the challenge is that any alternative solution on data architecture
should provide the means to view data in tables with the minimum effort no matter what is the
underlying structure.
The Conceptual Model Perspective
The Entity-Relationship model is universally accepted as the means to extend the relational
model in order to give meaning to the relationships among the relations. In our database
example we have the following ER diagram.
Import["C:DataIOaccdbExamplesSuppliesCatalogDB-ER.jpg", "jpg"]
The diagram above depicts the kind of relationships among the three entities of our database,
Parts, Suppliers and Catalog, and shows the datatypes of their attributes. This is a classic many to
many relationship between Parts and Suppliers where the Catalog is the associative entity, also
known as the junction table, bridge table, join table, etc. Primary and foreign keys are also specified
for the names of the attributes that play that role.
The Entity-Attributes ‘Silo’ Structure
The problem here is that from a semantic point of view, similar diagrams are in need from users that
want to express business processes but when we reach the implementation stage software engi-
neers have to marry business requirements with the technical constrains of the database system
hence the ER diagram you see. Generally speaking this is known as "The Model", a conceptual
4 Towards a New Data Modelling Architecture - Part 2.nb
5. view of the user on data. The ER version of the model has several limitations, due to the architec-
ture of RDBMS. The main one is that each attribute remains enclosed in the table structure and in
the case the same attribute appears in another table, the dataset that it represents has to be
repeated. In our example above, the primary key (pid) of "Parts" is repeated as a foreign key
(catpid) in Catalog. The difficulties that arise in data aggregation due to this limitation are substantial.
How to Break Free from the Entity-Attribute-Value Paradigm
The relational and the entity-relationship model made a huge impact in the IT world for nearly half a
century. But during this long period of standardization it meant also one thing, everyone had to
comply with the rules and requirements of the model. Everyone had to think in terms of Entity-
Attribute-Value or Subject-Predicate-Object as it is known in the RDF semantic model. Program-
ming languages have been affected to from this monolithic way of thinking. Although it proved to be
advantageous to program with classes and objects, it created an artificial problem of how to map
these onto persistent data structures on the disk, also known as the object-relational impedance
mismatch problem. Knowledge representation frameworks did not escape from this too. Ontologies
expressed in OWL followed the same paradigm with classes, attributes, and values. Serialization
methods such as JSON (object-name-value) and XML (element-attribute-value) also came after the
same rationale.
The Signified - Sign - Signifier Alternative Paradigm
The aforementioned Entity-Attribute bond and distinction plays its role here too. But most important
another concept, ‘value’, is added to make this triplet even more difficult to handle in our digital
world. This is mainly due to the fact that three perspectives, the conceptual, the representative and
the physical layer encoding are mixed in such a way that it is very hard to separate and work with
them at distinct levels of abstraction. The R3DM, or S3DM, conceptual framework that we discuss in
Part 3 is based on the natural process of semiosis where the signified, i.e. concept, entity, attribute
and the signifier, i.e. value are referenced through symbols, i.e. signs, at discrete layers. The same
philosophy is shared in the architecture of this database management system and we demonstrate
this with the following example.
◆ AtomicDB Model and Concepts
There are many ways one can start working with AtomicDB, and even this flexibility demonstrates
how vigorous is the whole data modeling process. But in order to continue our discussion on the
signified, sign, and signifier principle we will design first a simple concept map using the cmap tools
from IHMC that corresponds to the ER diagram above.
Towards a New Data Modelling Architecture - Part 2.nb 5
6. Design a Concept Model with CMAP Tools
Import["C:DataIOaccdbExamplesPartsSuppliersCatalogCMAP.jpg", "jpg"]
This demonstrates fully the point we have already made about Entities and Attributes. Entities in this
diagram are simply formed by grouping Attributes together. One or more Attributes are shared
between two or more Entities. According to the AtomicDB terminology, shared attributes are called
bridge concepts and this is the equivalent of the relationship that is implemented with primary and
foreign keys on two tables but here we are completely independent to mix and match them. For
example PartColor, could be merged with another attribute from a different table in another rela-
tional database.
Model and Concepts
Import the Concept Map
AtomicDB can read a CMAP cxl file and get the structure
Add The Model and Concepts Programmatically
Alternatively we can add the model and the concepts by using commands from our API. The func-
tions we use have been defined on top of those we ported from the C# API of AtomicDB. Now, we
assume that we have already established a connection with the AtomicDB database server. Let us
first create our model and give it a name.
In[210]:= modelName = "Parts-Suppliers Catalog Model";
addModel[modelName]
« NETObject[System.Collections.Generic.List`1[System.Collections.
Generic.List`1[IAMCore_SharpClient.Core_KeyValuePair]]] »
We can get back the model we have just added to the system and at the same time print also a
user-friendly output
6 Towards a New Data Modelling Architecture - Part 2.nb
7. In[211]:= (model = getModelByName[modelName]) // printKVP
Out[211]= {0, 3, 13, 256} → Parts-Suppliers Catalog Model
The key is a reference 4D vector that we use to access the model item (13=models, 289=our
model) and the value is the string we assigned as the name of the model. Everything that is stored
in AtomicDB has a key and a value. This makes AtomicDB fully symmetrical in terms of values,
structures and relationships. It a unique property that allows to build everything on top of it in a
symmetric way, e.g. the commands of the API that we will discuss in details in another part of this
series. Now let us add the concept names in groups. The first name in the list signifies a Nexus
concept, i.e. a concept that is used to associate the rest of the concepts in the list.
In[214]:= catalogConceptNames =
{"NX_Catalog", "SupplierIdentifier", "PartIdentifier", "PartCost"};
partConceptNames = {"NX_Part", "PartIdentifier", "PartName", "PartColor"};
supplierConceptNames =
{"NX_Supplier", "SupplierIdentifier", "SupplierName", "SupplierAddress"};
(catalogConcepts = addConceptsByName[modelName, catalogConceptNames]) //
printKVPL2
{2, 1025, 256, 1} → NX_Catalog
{2, 1025, 256, 2} → SupplierIdentifier
{2, 1025, 256, 3} → PartIdentifier
{2, 1025, 256, 4} → PartCost
As we expected, four concepts were added to our model. Notice the last two dimensions of the
keys, i.e. (context, item). It is using the same dimension as the instance of the model in the previous
reference to denote that these concepts belong to that model. Actually all the reference vectors in
AtomicDB are cleverly inter-related and there is never a chance for a collision when adding items.
Another observation we can make is that these items have been created in a different Environment,
Repository, the first two dimensions, than the model item. More about the reference vector system
of AtomicDB in another part. We will continue to add the rest of our concepts to the model.
In[217]:= (partConcepts = addConceptsByName[modelName, partConceptNames]) // printKVPL2
Out[217]//TableForm=
{2, 1025, 256, 5} → NX_Part
{2, 1025, 256, 3} → PartIdentifier
{2, 1025, 256, 6} → PartName
{2, 1025, 256, 7} → PartColor
(supplierConcepts = addConceptsByName[modelName, supplierConceptNames]) //
printKVPL2
{2, 1025, 256, 8} → NX_Supplier
{2, 1025, 256, 2} → SupplierIdentifier
{2, 1025, 256, 9} → SupplierName
{2, 1025, 256, 10} → SupplierAddress
In these concept groups you may notice that two of them PartIdentifier and SupplierIdentifier are the
same as in the first group. These concepts are called bridge concepts and play the same role as the
primary and foreign keys in relational data sets. The main difference and a great advantage of this
approach is that this time data sets are not dublicated. The same collection of items, i.e. data set of
the attribute can be referenced by many concepts.
Get The Concepts from a Model
Time to verify that all the concepts have been added to our model
Towards a New Data Modelling Architecture - Part 2.nb 7
8. getConceptsFromModelName[modelName] // printKVPL2
{2, 1025, 256, 1} → NX_Catalog
{2, 1025, 256, 2} → SupplierIdentifier
{2, 1025, 256, 3} → PartIdentifier
{2, 1025, 256, 4} → PartCost
{2, 1025, 256, 5} → NX_Part
{2, 1025, 256, 6} → PartName
{2, 1025, 256, 7} → PartColor
{2, 1025, 256, 8} → NX_Supplier
{2, 1025, 256, 9} → SupplierName
{2, 1025, 256, 10} → SupplierAddress
◆ AtomicDB Collections and Records
Terminology
◼ Data Item is a particular type of item that holds an atomic piece of data (an atomic value).
◼ Collection (data set) is a generic container for data items with no duplicates. A collection is
similar to the notion of attribute (column) data set in the relational model.
◼ Nexus item is a special type of data item whose role is to keep associations with the other data
items in a record. Nexus item plays a similar role to that of a record in the relational model.
◼ Nexus Collection is a special type of collection which holds nexus items only. Nexus collection
act similarly to the primary key column in the relational data model.
◼ Record is a set of data items from different collections each associated to the same nexus item
(exactly one per record)
◼ Group refers to several collections and associates them. The group is not a container for
collections. Each group has one and only one nexus collection.
◼ Bridge Collection is a certain type of collection that can be associated with more than one
group. Bridge collection act similarly to the foreign key column in the relational data model
◼ Concept is a special type of item that represents uniquely one collection of items. A collection
can have one or more representative concepts. A concept can be though as a reference to
collection.
◼ Model is a generic container for unique concepts that are associated to form higher constructs
and relations. Model is similar to a database schema, or view.
Collections and Records
Add Collections with Automap and AutoGroup options
With the following addCollectionsAutoMapGroupByName command Collections are automatically
associated with the concepts and a group is added.
8 Towards a New Data Modelling Architecture - Part 2.nb