AtomicDBCoreTech_White Papaer

Atomic Database Corp
CoreTechnology Overview
by Ron Everett, Chief Scientist AtomicDB
AtomicDB is software based, proprietary core technology, which can be applied to nu-
merous IT problem areas at significantly lower set-up and lifetime costs than table-
based data management systems and applications whilst maintaining an ex-
tensive performance advantage that does not require additional data warehousing.
The basis of the technology is an ‘n’-dimensional associative memory sys-
tem and solution development platform which can be utilized as a high-performance
information store for any data set aggregation, without any design phase, to target com-
plex information system analytics and be customized at run time to meet evolving re-
quirements. The associative memory system allows the computer or information proces-
sor hosting it to store and represent human knowledge directly, in a way analogous to
how humans do. This enables the rapid embodiment of knowledge systems repre-
sentative of the information needed for actualizing any business as an information system.
The ability to rapidly build and modify any information system or data model (as a knowledge
representation) and have it able to be evolved and adapted over time, with minimal or
no programming, is key to successful implementation of any modern information system.
Unfortunately most attempts at building modern information systems involve ex-
tensive programming due to the use of technologies that are no longer appropriate for the job.
The Associative Technology does not require or make use of any Database Manage-
ment System (RDBMS) to achieve its capabilities. It is a new methodology for
storing and managing all levels and complexities of relationships. The Technolo-
gy is scalable and yet is light enough to run on even the most minimal of smartphones.
There are five Software Components in the AtomicDB Core Technology:
IAMCore is where information for the application is stored and managed using propri-
etary,patented associative technology. IAMCore is the core “engine” of the development
platform. It stores information in a non-table based structure (essentially in ‘n’th normal
form), which eliminates the need for database-level namespace management, structural
dependencies such as joins,and provides an excellent foundation for consolidated views of the
organization. It also acts as a multi-dimensional “super index”, providing very fast response to
complexqueries.Theengineishighlyscalableforlargevolumesofdata,andaccommodates awide
range of data types,including documents of any sort.The storage strategy employed in IAMCore
utilizes an advanced 4-D vector-space model and provides an unusually high level of security.
IAMCore Explorer is a software component for developers to explore the con-
texts of IAMCore, to extend the relationships of the information elements, and en-
sure to that the contexts are populated accurately by imported information.
1

Concept Mapper is a software component that allows a developer or a business analyst
to capture the requirements for information that will be stored and managed in IAMCore.
The Concept Mapper efficiently captures as a model, the concepts of concern and interest
to users and the users’ views about the relationships between those concepts. Each mod-
el becomes a new view into any or all of the data sets that are ingested into IAMCore.
ManageIT is a software component designed for business analysts to map, import
and correlate data sets from any source into IAMCore. It is a production-software com-
ponent that accepts data from existing sources (RDBMS, structured and unstructured
text), and ingests it, correctly mapping it into the existing connected information net-
work of correlated elements within IAMCore. It is designed to be run in a 24/7 pro-
duction environment requiring no downtime or interruption of the production system.
Developer API Framework presents a messaging interface to the set of instructions or
calls that developers use to connect their applications to the IAMCore engine.There are only
6 of these instructions or calls. They support all the necessary requirements for adding,
associating, finding and retrieving information. Special privileged instructions are available
(through customer security and compliance policies) for modifying and deleting existing data.
Atomic Database Corp
White Paper
AtomicDB Associative Information Management
–The Missing Link for IT Systems
An AtomicDB Corp.Technical White Paper
Scope of Paper
The applicability of AtomicDB Associative Information Management Systems Technolo-
gies to the problem space of deriving and coordinating complex correlations amongst large
volumes of mixed and multi-source data from various application and operation domains,
specifically inclusive of tabular and textual (incl. XML, RDF, docs, email…) data sets, and con-
sequently providing a platform for disparate Data Systems Integration and Fusion, Intelli-
gent Analytics and Cross Domain Librarian (Contextual Search) Services. The challeng-
es with applying existing data management systems to accomplish this end is reviewed.
A Brief Review of Data Management Systems
Data Management systems (table based) have been around for 30+ years,and have been evolved
at great expense to be good at managing uniform sets of data values,but they are terribly ineffi-
cient at managing complex information where the relationships between and amongst the data
elements have dynamics, significance or meaning.This complex information represents most of
the information in businesses today.‘Knowledge Management’ efforts (primarily centered around
XML and RDF) have been made over the last 10+ years to try to address these data management
2

system inefficiencies,but they have fallen seriously short of both vendor promises and customer
expectationsandrequirements.BigDataisjustanotherwaveoftheseknowledgemanagementef-
forts using‘NoSQL’ technologies focussed on scaling out storage and access across‘n’ processors.
As the complexity and amount of recorded information increases really fast, we have been
trying to manage this dynamic multi-dimensional information world with namespace bound,
‘tuple- based’ structured storage that is highly resistant to change.There are serious ineffi-
ciencies intrinsic to the technologies in use that have led to an enormous number of prob-
lems globally.
The Global Problems faced today with IT Systems
Large organizations globally fact the following problems with their IT systems:
• Cost - $600+ Billion spent annually in IT.
• Success Rates - more than 80%* of IT Projects either fail or overrun schedules and budget
significantly (>2x). Projects that do get delivered typically have less than 50% of their original
functional specification.
• Inflexibility – Changes to existing systems and integration with other systems is extremely
difficult.
As a result, the IT systems that support the business wind up slowing its evolution. Up-to-
date, consolidated or composite views of the business across systems are still largely unavail-
able, even the largest, best-financed corporations.
• Difficulty with Providing Information - Any new questions that were not foreseen when the
system was originally designed become backlogged change orders with turn-around times of
days, weeks, or months. Only highly technical individuals with SQL skills and knowledge of the
underlying database structures can get the answers.
Difficulty with Upgrades:Typically there are many components in the system, supplied by dif-
ferent vendors.The customer becomes the system integrator. Preparing all components for a
coordinated update to the system is a major IT project in itself, for every system.
• Maintenance and Support: It’s like an added tax that the customer is required to pay be-
cause of the problems inherent in the IT systems.
• Size vs. Performance:The bigger they are, the poorer they perform. Enterprise systems
today that try to provide for the whole organization, now need new systems, like in-mem-
ory dashboards and analytics engines, in order to satisfy customer demands for ‘reasonable’
performance.
These problems have technical root causes in the underlying technology:
• Structural Issues in table-based data storage systems lead to enormous resource consump-
tion in the design, construction and maintenance of tables and class models
3

• Namespace Issues lead to severe challenges in developing, adapting and evolving applications
• Data Integration Issues across disparate data sources require massive mapping, trans-
formation, validation, and verifications efforts
• Query and Reporting Issues – special unique queries are needed for every single report
from every single table or table sets in every single database.
• Data Segmentation Issues – every table, in its normal form, by definition is a subset that
must be ‘joined’ with other subsets, in order to get answers for questions that aren’t limited
to any particular subset.
CurrentTechnologies Don’t Solve these Problems Efficiently
All these technical issues can be seen as a consequence of a decision made 30+ years ago,when
the IT community collectively decided structured storage was the key to successful data manage-
ment.At the time it was.In the beginning,data essentially consisted of numeric values,primarily
for accounting (and scientific) purposes,so columns and rows made sense.Since then things have
changed, a lot.We are now attempting to manage information with dozens to millions of rela-
tionships per data item and billions of data items per data system,but we are still using a 1970’s
2-D structured storage paradigm to try to store it.We evolved the tables to accommodate dif-
ferent data types but couldn’t evolve them to accommodate the dynamic multi- dimensional and
multi-contextual reality of modern information because the 2-D structure does not allow for the
meaning of the relationships between the items in the rows to be maintained within theTables.
Tables simply cannot be evolved to accommodate this aspect of reality. Nor can
they be evolved to accommodate the inevitable dynamics of modern informa-
tion and business, because they are by definition, stable 2-D storage structures.
When we attempt to use Tables as a storage paradigm for Information we discover that
Tables are a namespace bound, non-dynamic, 2-D, structured storage paradigm that has
a different structure for every Table in every Database. Every Table in every Database
is structured differently with different names, data types and configurations for every col-
umn. There are different names for every Column (field) in every Table in every Database.
Each application is developed with unique and special queries written to each specific data-
base design, table layout and named tables, columns and keys. For every Table in every Da-
tabase a different set of unique queries must be written to put the data in, find the data,
and get the data out. For data interactions between tables, a superset of queries needs to
be written that span table boundaries and use complicating foreign keys, inner and outer
‘joins’ and meta tables to cross- reference data elements. That’s just how it is, we are told.
With millions of custom built databases in use worldwide with multiple tables per DB, each
with multiple queries required, conservatively hundreds of millions of queries have been writ-
ten, each one specific to its DB and table set, and each one completely unique and un-reusable.
Some have hailed XML (RDF and triple stores) as the means to solve the n-dimensional re-
4

lationship problem, because with it, meta-information can be captured, but XML is plagued
with other problems, not the least of which are namespace binding requiring semantic accord,
massively replicated tags and data, the heavy overhead of text based processing, the necessity
of searching and indexing all the text in every possible XML document for each and every key/
value-tag/data match sought and the distribution of the tagged datasets across innumerable
XML documents, stored in 2-D table-referenced 2-D file structures. Add to that list the over-
head imposed by using Semantic Web languages and ontologies and the PhD level specialists
required to develop and maintain these ‘knowledge’ oriented systems and you get even more
namespace entrenchment and hence specialization of the applications developed with it all.
So What Can We Do?
The obvious answer is to eliminate the root causes of the problems: Transition from us-
ing table- based data management systems for managing Information. Let them con-
tinue to do what they were designed to do: store numerical data sets for con-
venient and fast access and processing. Let all the information about the data be
stored in a system designed to store and manage complex relationships: AtomicDB.
The New Alternative: Information, instead of Data - Management Sys-
tems
Table-based systems enforce fragmentation, segregation and separation in our information sys-
tems.They are anti-integration,anti-connection.Every table is a silo.Every cell is an atom of data
with no awareness of its contexts, or how it fits in to anything beyond its cell. It can be located
by external intelligence but on its own it’s a“dumb” participant in the system – the ultimate dis-
connected micro-fragment accessible only by knowing the column and the record it exists in.
The alternative is to replace the data elements with information at the atomic level of the
system. Instead of a data atom in a table, we have an information atom with no table. Infor-
mation atoms exist in a multi- D vector space unbounded by data structures and know their
context, such as a “customer” or a “product”, just like atoms in the physical world “know” they
are nitrogen or hydrogen items and behave accordingly. Information atoms also know when
they were created, when they were last modified, and what other information atoms of other
types are associated with them.They know their parents, their siblings, and their workplace
associates.They are powerful little entities and most certainly NOT fragments. Nor are they
triple statements requiring endless extraneous indexing. They are each a composite collec-
tion of everything about something. They were designed from the bottom-up and the top-
down for the connected world that modern information systems increasing MUST represent.
This insight to storing information as information all the way down – is the basic characteris-
tic of the AtomicDB technology that can help solve the global IT problems discussed earlier.
A MoreTechnical Description of AtomicDBTechnology
AtomicDB technology is a token-based instance-centric associative information manage-
ment system, which is data-type agnostic. It has no pre-set structures, such as tables or file
types, to have to fit data of specific types and formats into. AtomicDB can assimilate any
data sets into its associative model, which includes a runtime configurable, multi-dimen-
5

sional storage system.The multi-dimensional storage system contains no tables and thus allows
organizations to capture, store and process the information continuum of their business, in a
single model that is one to one with users’ concept model of the organization and its busi-
ness.All the datasets related to every part of the business can be assimilated from their ex-
isting data stores and auto-inter-related in ways that make meaningful sense to the end users.
Upon assimilation into an Associative Management Information System, all data elements
from all sources become associative concepts which are contextually mapped, fully
normalized, unified, and cross-referenced on a cell by cell, tag by tag, or term by term basis.
Assimilation creates a fully indexed representation of the original data set, where every piece
of data (names, labels and values) becomes an attribute of the AtomicDB item representing
it, each of which will be at the center of its entire universe of relationships, so that from any
item, all its relationships can be accessed. Every relationship is bi-directional, so every piece
of data becomes a possible entry point for queries and searches.This means analytics look-
ing for similarity or ‘likeness’ is easily accommodated and is computationally very efficient.
Regarding multi-data sets, by creating instances of AtomicDB items to ‘represent’ every
instance of every referenced entity (be it a file, text, a record…), those entities them-
selves need not be directly assimilated into AtomicDB. Rather, the entities can continue
to exist in their current repository, thus requiring no change in the existing data systems
and applications, with the URL / URI / file path / record ID being added as an attribute (along
with any required names, labels and values) of their tokenized representative.The source sets’
representatives can be maintained distinct or manually / automatically correlated with other
assimilated datasets either during or after assimilation. As such, coordinating a system of in-
dependent data sources of differing data types and formats is an inherent feature of AtomicDB.
Unlike in table and XML document based data management,AtomicDB items are not namespace
bound entities.Namespace is just an attribute inAtomicDB technologies.Everything inAtomicDB
is managed in ‘token-space’.All processing, relationship management and resolution is handled
using instance unique tokens or keys. Namespace is used as a context for data assimilation and
presentation,and as such,every item can have any number of namespace contexts represent-
ing it in any number of languages or application domains.This means that data sets brought
in toAtomicDB from one domain can become fully integrated and correlated with existing data
sets from other domains,even if identical concepts in those datasets differ in namespace. They
can be re-presented and used across all domains in the operations space of an organization.
A meta-tag model of classifying documents using an XML (RDF | URI / triple, quad store)
approach is common in the knowledge industry. It has serious drawbacks concern-
ing scale and dynamics, not to mention lack of semantic consistency across sub-
ject domains and languages. The AtomicDB model does not ‘tag’ files by embed-
ding the ‘tags’ in the files. Instead AtomicDB ‘tags’ the files’ representatives in the
AtomicDB domain and bi-directionally links the file with its representative in AtomicDB.
This is a major departure from the XML document model implementation, although it
supports the intent of the XML document model.
The AtomicDB associative model also allows the file to be classified in as many ways as
there are AtomicDB domain instances with access to the file, with no requirement for on-
tological accord across domains. However, because every AtomicDB system is inherently in-
ter-reference-able, and access and security policies are implemented associatively, AtoicDB
6

can allow personalization, compartmentalization, sharing and collaboration to be man-
aged in a distributed way at the domain level or imposed from a central point of authority.
In addition, because there are no structure limitations to the storage model (no tables),
there is no limitation to the meta-tagging of anything, including the tags themselves.This en-
ables the capture of all aspects of everything about anything. Unlike tables, where the meaning
of the relationship, say, between the data in column 1 and column 3 … or between columns
2 and 6 cannot be stored within the table, and is typically lost or is only implicit and needs
the IT domain specialist to extract and reveal it, each relationship can be contextualized
and effectively meta-tagged thus enabling dynamic ‘knowledge’ capture and meta-data
management of the entire information world of the business in a single associative system.
HowThis Approach Addresses Global IT Problems
Cost
The biggest cost component in RDBM information systems is the design, construction and
maintenance of tables and class models. In the associative approach, these steps disap-
pear completely. They just aren’t necessary any more. All the ‘n’-D - 2-D and 2-D to ‘n’-D
data analysis and ETL mapping efforts are no longer required, because all the information ei-
ther says its natural ‘n’-D state or is reconstituted from existing databases in n-D (‘n’-tuple,
many to many) upon its assimilation into the Associative system.The overall cost sayings over
the lifetime of the application by removing this step in the process can range from 30-80%.
Success Rates
Our data repositories to date, (Hierarchical, Relational, Object oriented DBMS’s and SGML,
RDF, XML and HTML repositories) cannot manage the naturally occurring associative order
of information, so we have been forced to structure transactions into segregated sets (finance,
human resources, manufacturing, etc.) and establish a reporting hierarchy by which we can
manage, as best we can, the natural order in which information exists. In establishing these
“manageable” structures we dumb down the information, eliminate most of its context, and
therefore a good deal of its value. In this limited incarnation, which most of the context that
provides the value gone, the dumbed down information can only support a subset of the full
spectrum of business requirements. The dumbing down of information necessary in
Associative paradigm can satisfy the full range
of business requirements because it accurately
reflects the natural state of information – a node
model of infinitely connectable components of
various types from a text or numeric value to a
document, sketch, picture or video.
Relational paradigm (relation=table in its orig-
inal meaning) is a weak and incomplete reflec-
tion of the natural state of information. Provid-
ing even a partial reflection of reality takes great
effort. Responding to the full range of business
requirements is impossible
7

the relational model is a major reason why so few IT systems meet customer expectations.
A second reason is the difficulty of integrating information across databases that are designed
as silos all the way down.The RDBMs itself is a silo.The tables within it are silos.The data cells
within the tables are silos.Successfully reconstructing these fragments into an accurate consol-
idated or composite view of the organization runs on a continuum from “possible to achieve
at very high cost” (on the optimistic end) to“impossible to achieve” (on the conservative end).
Inflexibility
Namespace issues (the requirement to bind and hard-code (SQL) applications to specific ta-
ble and column names) lead to severe challenges in adapting IT systems to changing circum-
stances.With Atomic DB Associative Information Managements Systems, generic applications
can be made which can add, find, get, modify, process and present information, inde-
pendent of the datasets involved. This can vastly change the entire IT landscape, such
that the hyper-specialization and inherent resistance to change of the current 2-D paradigm,
with its endless requirements for segregated data sets and custom queries unique to each
use instance, can now be augmented, or eventually supplanted where applicable, at low-cost.
The Atomic DB multi-D storage model allows every data item to be stored in association
with (and hence fully reference-able by) all other relevant data items, (conceptual, relational,
contextual,semantic,content and Meta) any way that is meaningful to users, (associations
via symbolic tokens, not namespace). It can provide a complete meta-referencing (“super-in-
dexing”) system for those existing DB solutions that are systems of record, enabling virtual ag-
gregation,abstracting change from those existing systems.New models and concepts with new
data items are simply added to the existingAtomic DB information network and associated to
other existing data items. There’s no “re-structuring” required. Instead, the virtual structure
grows organically, yet remains completely secure. The result is flexible, inclusive,adaptable,
and evolvable solutions that keep up as information, circumstances and requirements change.
Difficulty with Providing Information
A major issue with RDBMs technology is the sheer volume of unique queries that must be created
and maintained to provide every single report from every single table or table sets in every single
database.An additional issue is the complexity of writing queries that attempt to span tables or
databases.There are deep skills and inner workings knowledge needed to create these unique
queries and handle the complexity of spanning a wide range of data management technologies.
Much of this problem has been addressed by adding an object layer on top of the databas-
es. Unfortunately, while this does work to ameliorate the problems to some degree, it involves
a complex,extensive work effort to semantically model the entire business as a series of encap-
sulated data objects and classes, adding another namespace-bound layer of complexity, and
resistance to change, which requires yet another set of experts to build,modify and maintain it.
These issues can now be mitigated by a single technology that can be fully inclusive of all of
a business‘s datasets, in a fully-indexing, associative model that reduces the number of unique
queries to a minimal generic set that is applicable to each and every data set, as all data sets
are generic to Atomic DB.When one eliminates the namespace and structure binding at the
8

storage level, and replaces it with token-space binding, all retrieval is done directly via rela-
tionships in token-space, allowing access to any information from any ingested data set with-
out requiring new queries to be written for each particular instance of filters and targets.
Since there are no intervening layers, in-memory dashboards can be both read and write.
A third issue is the communication chasm that has traditionally existed between business
users and IT.The language of business needs to be translated into the language of RDBM ta-
bles and queries.Again, an object layer with encapsulated data objects and classes attempt to
resolve this by allowing tabular data field names to be represented by object class names more
suited to the user’s semantic understanding of the business. This is still a namespace-bind-
ing effort which stands as an impediment to keeping up with the dynamics of any business.
In order to produce COTS systems, IT has assumed that all businesses are pretty much
alike. Hence the business, and those in it, are required to work with and in the conceptual
framework designed by the IT companies. Although it may approximate the actual busi-
ness, much is still lost in the translation due to the attempted reuse of objects and class-
es initially developed for the first few customers. That chasm disappears with Atom-
ic DB technology because all information is stored and retrieved relative to the
concept-based, business language. The language of every business is mapped to and
from a universal ‘token-space’ which goes all the way down to the smallest item of stor-
age in the system. Nothing is “lost in the translation” because there is no translation.
Difficulty with Upgrades:
Part of the upgrade difficulty is the tight binding of applications through the object layer to the
underlying table structures and data sets, as discussed above.Atomic DB technology, which is
not bound to the data sets, avoids the hard shocks that IT typically experiences when there
are changes at the er model or database table design level.Atomic DB simply accommodates
data sets. New or modified data sets get ingested, and auto-map and auto-correlate with al-
ready ingested sets.No structural or query design,is needed.The system adapts to and accom-
modates the new data sets because the associative AI, modeled on human reasoning, works.
A second part of the upgrade difficulty is the issue of organizing a basket of RDBMs, ETL,
metadata management and business intelligence tools (all with their various version num-
bers) into a single coordinated, tested production release. Atomic DB technology pro-
vides all these capabilities in a single, integrated, pre-tested version that in many cas-
es can be introduced into a running production system without taking the system down.
Maintenance and Support:
The cost of maintenance of existing enterprise systems is usually set at between 10 and 15%
of the cost of the systems.That suggests that these systems require a high degree of mainte-
nance that will equal the cost of a system over its lifetime, independent of upgrades.Atomic
DB technology, because of its associative nature, can be maintained for a small fraction of that.
9

Size vs. Performance:
A serious problem with structured storage paradigms involves scale. The greater the scale,
the less the performance. There are many factors affecting this. Many efforts to overcome
the issue have resulted in numerous ‘improvements’ that generally involve architec-
tural enhancements, such as distribution of datasets over multi-processor networks
or bolt-on accelerators and other new technologies such as in- memory dashboards
and analytic engines. Unfortunately all of these enhancements comes with a cost. Either
extensive and expensive work effort is required to implement them or a se-
rious scale-up of computer equipment availability is needed to handle the workload.
Atomic DB performance does not degrade significantly with size. The Data Scientist role
to determine data relevancy and cherry pick the appropriate data subsets for the in-mem-
ory dash boards and analytic engines is not required, because, due to its associative AI,
only referenced items need be loaded into memory to resolve a retrieval request.The rest
stays available in the Atomic DB information network which is resident on the disk system.
Atomic DB Capabilities
Applicability to the problem space
Relavance Associative Information Management Systems technologies are extremely well
suited to provide an effective solution in the problem space of working with and integrating
multiple data sources and providing an integrated, inclusive information management sys-
tem, directly addressing the root causes of the technical issues of the global IT problems.
Resolution
AtomicDB Associative Information Management Systems:
Works like the brain
It learns facts. Store and find everything associatively in a Concept based, multi-dimensional
information storage and management system that can be 100% compatible with the concept
model of the business, yet works independently from the ‘namespace’ of the existing data sets
(names, labels & values are attributes)
Full Semantic Model without the Universal Ontology problems
Amulti-Dstoragemodelallowseverydataitemtobestoredinassociationwith(andhencefullyref-
erence-able by) all other relevant data items,(conceptual,relational,contextual,semantic,content
and meta) any way that is meaningful to users,(associations via symbolic tokens,not namespace)
No translation needed
No massive ‘n’-D to 2-D and 2-D to ‘n’-D data analysis and ETL mapping efforts,
everything (all data) remains (or is reconstituted from existing databases) in a multi-D para-
10

digm, (‘n’-tuple, many to many) fully inter-referenced, accessible directly by users and programs
Data warehousing with no tables, hence no set up time and cost
Assimilate all data sets directly from existing databases, XML, etc., into an inclusive
Associative model and regenerate the original ‘n’-D information set of the business
AtomicDB Associative Information Systems:
are a technological solution to the problems and issues inherent in using table based
data storage and management systems. They provide the following business advantages:
• Significant IT cost reduction (no tables = low cost)
• Very fast delivery of deployable IT solutions for Big Data, BI and Decision Support
(delivers information when the decision makers need it)
• Addresses the large issue of lost business opportunities caused by extensive (and inevitable)
delivery delays in table based systems
• Comprehensive, inclusive, agile, adaptable, and evolvable solutions (keep up as circumstances
and requirements change)
• Gives access to pan-organizational information by handling data aggregation and data fusion
automatically, (no ETL, DW, DM )
• Addresses major IT issues and problems facing most organizations
To Learn More Contact Us @: info@AtomicDB.net
11

AtomicDBCoreTech_White Papaer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AtomicDBCoreTech_White Papaer

Similar to AtomicDBCoreTech_White Papaer (20)

More from JEAN-MICHEL LETENNIER

More from JEAN-MICHEL LETENNIER (7)

AtomicDBCoreTech_White Papaer