SlideShare a Scribd company logo
1 of 9
Download to read offline
Atomc DB
What is the clear meaning of ‘association’ in AtomicDB? Is it calculated
from a model? Describe in detail the algorithm which sets it or deter-
mines its existence across potentially disparate data sources. Is it a
functional type relationship (subject -> verb -> target) or a numerical
value (number of co-occurrences of data elements) or what is it? Tell
us why we should trust t his algorithm?
In the AtomicDB system all associations are bi-directional. Any item any-
where in vector space (in this case accessed through an encapsulated
quad-d 128 bit token) can reference directly any other item(s), and since
the AtomicDB vector space is made up of 1018
virtual points, many discrete
data items can be referenced.
An association is a reference from one data item to another, and a corre-
sponding reference from the other data item to the one. There is no sepa-
rate ‘connector’ or predicate item, nor is there a table with data item in-
dexed co-occurence counts and keys.
One may think of it as an ‘n’ dimensional network of continuously counted
relationships, organized in vector space dimensions, where each point in
the network has contained within it, direct ‘paths’ (actually vector space
indexes) to each and every other related point. The ‘algorithm’ is entirely
fact-based and absolutely deterministic, (non-statistical).
FAQs
Atomc DB
What scenarios explicitly do you envision AtomicDB performs best as
compared to RDBMS, triplestore, NoSQL, graph DB etc
RDBMS: Cost of Development, Cost of Maintenance, Cost of Modification,
Cost of Operating, Namespace Binding, Structure Binding… AtomicDB is
an instant db and forever adaptable and evlovable.
Triplestore: Namespace restrictions, Contextualization limitations,
XML / Document stores: Tree based storage, Minimal capacity for organi-
zational complexity… AtomicDB, at its core, is datatype and namespace
agnostic, always fully contextualized, and structure-free and thus data
sets can just be integrated, used, organized and re-organized as desired or
required.
Explain how this technology is different from triplestore.
Triple stores are stored as subject – predicate – object records, typically in
a two table (type) configuration consisting of an entity table, which cap-
tures the namespace of the data and whose data item id’s are used in one
or many relations tables where the triple is represented using the id’s of
the entity table. Namespace management is key to productive use of tri-
ple stores and same-named entities from different contexts must be pre
or post processed to disambiguate them.
In AtomicDB, the value of an item is just an attribute of the token repre-
senting the item. Because all data sets are auto-contextualized on inges-
tion, co-occurrence of terms is referenced from an abstraction that han-
dles multiple instances mapped to different contexts using tokens in vec-
tor space.
AtomicDB has no tables. And predicates are implemented as dimensions
in vector space, not edge objects that are referenced from the triple rec-
ords.
FAQs
Atomc DB
Talk about AtomicDB in terms of ACID vs BASE, CAP theorem, what
are tradeoffs in using AtomicDB?
AtomicDB has no tables at all, so it is very difficult to answer that without
extensive explanation and qualification, but, in a probably unsatifactory
summary, because AtomicDB is a combination of network, vector space
and atomic models, where item uniqueness is guaranteed, data values
don’t participate in relationships except as attributes of items, and distri-
bution and replication are on completely different processing vectors, the
issues of those tradeoffs are far less important.
FAQs
Atomc DB
Who are other players in this field and what sets you apart from
them?
There are three basic models out there, File-Cluster or BTree-based XML/
JSON doc stores, table-based triple stores and in-memory columnar-
oriented table stores. AtomicDB is none of those, but has the architectural
advantage of being able to provide eqivalent performance to all of them.
AtomicDB is an always active network of interconnected in vector-space
Informational elements. Each piece of data resides atomically in associa-
tion with every other related piece of data at the center of its universe of
relationships and thus each piece of data is an entry point into the net-
work. At the low level it is a network. At the high level it is a graph.
Neo4J – is about the most advanced graph db, but suffers (as they all do)
with namespace and meta management limitations, as well as having any
high level contextualization being hidden in the triple stores themselves,
as none of that is native to the system itself. Indexers and meta attribution
has to be bolted on and is not intrinsic to triples. MongoDB and Hadoop,
(etc) are great file tree stores for huge, simplistic data sets, as node and
disk spanning is built in, but if data and relationship complexity is an issue,
all need extensive post processing (read: highly paid consultants and data
scientists) to qualify what got put in there for each and every thing one
might want to get out. Hana, Qlikview, and hundreds of other in memory
systems are just snapshots of other data sets. AtomicDB is always read /
write.
FAQs
Atomc DB
How does AtomicDB handle time series. How does it manage associ-
tions between data sources with entities that have attributes that
change over time?
Entities and Attributes are Atomic Items and there is no internal distinc-
tion between them. Events are handled as transactions and are also
Atomic Items, with relationships to the Entities and Attributes partici-
pating in the Event. Depending on the nature of the data sources and
their intended use, one would typically utilize the cardinality of that rela-
tionship dimension to always show the latest Event reference, which
would, thereby, always have the most up to date Attribute values associat-
ed.
Describe any provisions for multiple servers if data sets get too big for
single disk
Because of the Vector space mapping of the Token Keys that are used to
represent the data elements, data sets can be mapped to any number of
physical destinations, that are preferably on one or several contingent
high bandwidth networks. Each Token Key in both a unique identifier in
128 bit space as well as a logical mapping to a specific node/disk/block/
sector/offset or equivalent location where the data element resides. Seg-
mentation or sharding in the classic sense is handled quite differently
since all AtomicDB systems can be be configured to inter-relate with one
another, since every instance is compatible with every other instance by
design.
FAQs
Atomc DB
How do we make this work for large disparate datasets that may not
be cleanly linked? How does AtomicDB associate data that was
originally collected/ingested without any requirement at that time
that they be linked in any way but may represent the same or as-
sociated objects.
Any field from any ingested data set can be post merged with any field
from any other data set, and auto-data-merge / de-duplication / unifica-
tion / correlation will occur. This function is actually a primitive.
I imagine a scenario where 2 sets of data may not associate directly
but indirectly through a ‘third party’ data set? Describe how
AtomicDB might determine there is link between first 2 sets.
If the third party data set is also ingested and there are corresponding da-
ta fields, by including the ‘third party’ data set into the Model where the
two data sets reside, it will auto correlate, unify the appropriate fields and
de-duplicate the data.
How does AtomicDB handle continuous numeric data… does each val-
ue get its own data node or is data binned? We have numerical
data potentially spanning vast numerical scales. Describe the bin-
ning/discretization algorithm if there is any?
The best way to handle data streams will depend on the intended use.
Most often what matters is thresholds and patterns that evolve in or are
derived from the data and since it is usually based on some temporal ag-
gregation it is important to be able to process on a defined temporal gran-
ularity that may vary from usecase to usecase. Feature sets are just
patterns in relationships to AtomicDB and entities or events with similar
features can be easily accessed using a reflexive association function com-
posed of two GET’s.
FAQs
Atomc DB
Do we lose any functionality at all in going from SQL to AtomicDB
(grouping, aggregating, date typing etc)
No.
We often need to classify, cluster objects in sense of machine learning
(both supervised, unsupervised) as well as select, extract, reduce
dimensions to only relevant ones as in PCA for example. Can you
describe how AtomicDB makes this job easier?
Meta management is fully integrated into the AtomicDB system. Classifi-
cation, categorization, grouping and clustering is as simple as adding asso-
ciations to any set of items. Dimension reduction is totally unnecessary
because all data elements are fully contextualized and can be referenced
selectively without any need for extraction. Add all fields of interest to a
Model, (ostensibly a view) Select target fields by clicking on them in a win-
dow, Select filter criteria by clicking on them in a window. Push Get
button. Review results. No programmers, data specialists, data scientists,
db specialists needed.
Do we need to understand more about the ETL tool itself? With only
a “GET” function to retrieve data, it would appear that the ingest
side is responsible for the adds/drops/updates
The tool we have is an EL tool. Transformation is usually needed only
when trying to map extracted data sets to a different (usually incompati-
ble) structure (such a data warehouse or new db). Since AtomicDB was
designed to simply accommodate any existing data structure, we don’t
need to transform it for those reasons. We might want to transform a da-
ta set because it was really badly designed or poorly implemented, such
as having columns which should be items, but that would be done with a
mapping in a preprocessor. The API also has IMPORT, ADD, MODIFY and
ASSOCIATE functions.
FAQs
Atomc DB
Couldn’t quite see how you would do a range query using “GET”
sub-directives within the API
Or a sort…
sub-directives within the API
Data appeared already normalised and fairly clean, which we know is
sometimes an issues. Manually editing on the GUI probably
wouldn’t catch all needs. Is there anything special that would
help here
Almost always an issue. I have rarely seen ‘clean’ data, except from except
from certain 3 letter agencies after redaction.
In terms of pre-built cleansers, we find it easier to write a quick parser
that bins the data into Known Good, Questionable, and Somethings
Wrong Here bins. Because data items are unified, de-duped and contextu-
alized, writing custom cleaner algorithms for pre or post-processing are
relatively trivial. If you don’t have in-house expertise, we can provide as
needed.
Unstructured textual data?
Yes it is.
We have written app-level parsers that identify all potentially subject indi-
cating terms and produce a semi-structured representation (in AtomicDB
tokens) of the document with bidirectional associations done on a head-
ing, sentence, paragraph and section (chapter) basis. From that, feature
sets, subject derivation and auto-similarity mapping can be done. We can
also intergate the Semantic Parser of your choice.
20) Performance as well is a concern for me, both an query time and also
ingest.
Me to. We are always optimizing efficiency and performance to remain
competitive.
FAQs
Contact Info
Jean Michel LeTennier jm@atomicdb.net
John Carroll john@atomicdb.net
Dr Phil Templeton ptempleton@atomicdb.net
http://www.atomicdb.net

More Related Content

What's hot

Collectn framework copy
Collectn framework   copyCollectn framework   copy
Collectn framework copycharan kumar
 
23. Advanced Datatypes and New Application in DBMS
23. Advanced Datatypes and New Application in DBMS23. Advanced Datatypes and New Application in DBMS
23. Advanced Datatypes and New Application in DBMSkoolkampus
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesAbdul Haseeb
 
Intake 38 data access 1
Intake 38 data access 1Intake 38 data access 1
Intake 38 data access 1Mahmoud Ouf
 
Pavel_Kravchenko_Mobile Development
Pavel_Kravchenko_Mobile DevelopmentPavel_Kravchenko_Mobile Development
Pavel_Kravchenko_Mobile DevelopmentCiklum
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachIEEEFINALYEARPROJECTS
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Basic terminologies
Basic terminologiesBasic terminologies
Basic terminologiesRajendran
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisationMuzamil Hussain
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization Uma mohan
 
Ado.net & data persistence frameworks
Ado.net & data persistence frameworksAdo.net & data persistence frameworks
Ado.net & data persistence frameworksLuis Goldster
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...Happiest Minds Technologies
 
DBMS _Relational model
DBMS _Relational modelDBMS _Relational model
DBMS _Relational modelAzizul Mamun
 

What's hot (18)

Database
DatabaseDatabase
Database
 
Binary search tree
Binary search treeBinary search tree
Binary search tree
 
Collectn framework copy
Collectn framework   copyCollectn framework   copy
Collectn framework copy
 
Object oriented data model
Object oriented data modelObject oriented data model
Object oriented data model
 
23. Advanced Datatypes and New Application in DBMS
23. Advanced Datatypes and New Application in DBMS23. Advanced Datatypes and New Application in DBMS
23. Advanced Datatypes and New Application in DBMS
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short Notes
 
Intake 38 data access 1
Intake 38 data access 1Intake 38 data access 1
Intake 38 data access 1
 
Pavel_Kravchenko_Mobile Development
Pavel_Kravchenko_Mobile DevelopmentPavel_Kravchenko_Mobile Development
Pavel_Kravchenko_Mobile Development
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approach
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Basic terminologies
Basic terminologiesBasic terminologies
Basic terminologies
 
Nagaraju
NagarajuNagaraju
Nagaraju
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisation
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization
 
Ado.net & data persistence frameworks
Ado.net & data persistence frameworksAdo.net & data persistence frameworks
Ado.net & data persistence frameworks
 
Lesson 1 overview
Lesson 1   overviewLesson 1   overview
Lesson 1 overview
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
 
DBMS _Relational model
DBMS _Relational modelDBMS _Relational model
DBMS _Relational model
 

Similar to AtomiDB FAQs

Ado.net xml data serialization
Ado.net xml data serializationAdo.net xml data serialization
Ado.net xml data serializationRaghu nath
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Mohit Sngg
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva
 
Interview questions(programming)
Interview questions(programming)Interview questions(programming)
Interview questions(programming)sunilbhaisora1
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational modelChirag vasava
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answerssheibansari
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfrsujeet169
 
Database Administrator interview questions and answers
Database Administrator interview questions and answersDatabase Administrator interview questions and answers
Database Administrator interview questions and answersMLR Institute of Technology
 
2. Chapter Two.pdf
2. Chapter Two.pdf2. Chapter Two.pdf
2. Chapter Two.pdffikadumola
 
Asp.net interview questions
Asp.net interview questionsAsp.net interview questions
Asp.net interview questionsAkhil Mittal
 
Data Structure.pptx
Data Structure.pptxData Structure.pptx
Data Structure.pptxSajalFayyaz
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseMohammad Shaker
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Gabriele Baldassarre
 

Similar to AtomiDB FAQs (20)

Ado.net xml data serialization
Ado.net xml data serializationAdo.net xml data serialization
Ado.net xml data serialization
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»
 
Interview questions(programming)
Interview questions(programming)Interview questions(programming)
Interview questions(programming)
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answers
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdf
 
Database Administrator interview questions and answers
Database Administrator interview questions and answersDatabase Administrator interview questions and answers
Database Administrator interview questions and answers
 
2. Chapter Two.pdf
2. Chapter Two.pdf2. Chapter Two.pdf
2. Chapter Two.pdf
 
Introduction to odbms
Introduction to odbmsIntroduction to odbms
Introduction to odbms
 
Rdbms
RdbmsRdbms
Rdbms
 
MyBatis
MyBatisMyBatis
MyBatis
 
Asp.net interview questions
Asp.net interview questionsAsp.net interview questions
Asp.net interview questions
 
7 data management design
7 data management design7 data management design
7 data management design
 
Data Structure.pptx
Data Structure.pptxData Structure.pptx
Data Structure.pptx
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
 
Database management systems
Database management systemsDatabase management systems
Database management systems
 

AtomiDB FAQs

  • 1. Atomc DB What is the clear meaning of ‘association’ in AtomicDB? Is it calculated from a model? Describe in detail the algorithm which sets it or deter- mines its existence across potentially disparate data sources. Is it a functional type relationship (subject -> verb -> target) or a numerical value (number of co-occurrences of data elements) or what is it? Tell us why we should trust t his algorithm? In the AtomicDB system all associations are bi-directional. Any item any- where in vector space (in this case accessed through an encapsulated quad-d 128 bit token) can reference directly any other item(s), and since the AtomicDB vector space is made up of 1018 virtual points, many discrete data items can be referenced. An association is a reference from one data item to another, and a corre- sponding reference from the other data item to the one. There is no sepa- rate ‘connector’ or predicate item, nor is there a table with data item in- dexed co-occurence counts and keys. One may think of it as an ‘n’ dimensional network of continuously counted relationships, organized in vector space dimensions, where each point in the network has contained within it, direct ‘paths’ (actually vector space indexes) to each and every other related point. The ‘algorithm’ is entirely fact-based and absolutely deterministic, (non-statistical). FAQs
  • 2. Atomc DB What scenarios explicitly do you envision AtomicDB performs best as compared to RDBMS, triplestore, NoSQL, graph DB etc RDBMS: Cost of Development, Cost of Maintenance, Cost of Modification, Cost of Operating, Namespace Binding, Structure Binding… AtomicDB is an instant db and forever adaptable and evlovable. Triplestore: Namespace restrictions, Contextualization limitations, XML / Document stores: Tree based storage, Minimal capacity for organi- zational complexity… AtomicDB, at its core, is datatype and namespace agnostic, always fully contextualized, and structure-free and thus data sets can just be integrated, used, organized and re-organized as desired or required. Explain how this technology is different from triplestore. Triple stores are stored as subject – predicate – object records, typically in a two table (type) configuration consisting of an entity table, which cap- tures the namespace of the data and whose data item id’s are used in one or many relations tables where the triple is represented using the id’s of the entity table. Namespace management is key to productive use of tri- ple stores and same-named entities from different contexts must be pre or post processed to disambiguate them. In AtomicDB, the value of an item is just an attribute of the token repre- senting the item. Because all data sets are auto-contextualized on inges- tion, co-occurrence of terms is referenced from an abstraction that han- dles multiple instances mapped to different contexts using tokens in vec- tor space. AtomicDB has no tables. And predicates are implemented as dimensions in vector space, not edge objects that are referenced from the triple rec- ords. FAQs
  • 3. Atomc DB Talk about AtomicDB in terms of ACID vs BASE, CAP theorem, what are tradeoffs in using AtomicDB? AtomicDB has no tables at all, so it is very difficult to answer that without extensive explanation and qualification, but, in a probably unsatifactory summary, because AtomicDB is a combination of network, vector space and atomic models, where item uniqueness is guaranteed, data values don’t participate in relationships except as attributes of items, and distri- bution and replication are on completely different processing vectors, the issues of those tradeoffs are far less important. FAQs
  • 4. Atomc DB Who are other players in this field and what sets you apart from them? There are three basic models out there, File-Cluster or BTree-based XML/ JSON doc stores, table-based triple stores and in-memory columnar- oriented table stores. AtomicDB is none of those, but has the architectural advantage of being able to provide eqivalent performance to all of them. AtomicDB is an always active network of interconnected in vector-space Informational elements. Each piece of data resides atomically in associa- tion with every other related piece of data at the center of its universe of relationships and thus each piece of data is an entry point into the net- work. At the low level it is a network. At the high level it is a graph. Neo4J – is about the most advanced graph db, but suffers (as they all do) with namespace and meta management limitations, as well as having any high level contextualization being hidden in the triple stores themselves, as none of that is native to the system itself. Indexers and meta attribution has to be bolted on and is not intrinsic to triples. MongoDB and Hadoop, (etc) are great file tree stores for huge, simplistic data sets, as node and disk spanning is built in, but if data and relationship complexity is an issue, all need extensive post processing (read: highly paid consultants and data scientists) to qualify what got put in there for each and every thing one might want to get out. Hana, Qlikview, and hundreds of other in memory systems are just snapshots of other data sets. AtomicDB is always read / write. FAQs
  • 5. Atomc DB How does AtomicDB handle time series. How does it manage associ- tions between data sources with entities that have attributes that change over time? Entities and Attributes are Atomic Items and there is no internal distinc- tion between them. Events are handled as transactions and are also Atomic Items, with relationships to the Entities and Attributes partici- pating in the Event. Depending on the nature of the data sources and their intended use, one would typically utilize the cardinality of that rela- tionship dimension to always show the latest Event reference, which would, thereby, always have the most up to date Attribute values associat- ed. Describe any provisions for multiple servers if data sets get too big for single disk Because of the Vector space mapping of the Token Keys that are used to represent the data elements, data sets can be mapped to any number of physical destinations, that are preferably on one or several contingent high bandwidth networks. Each Token Key in both a unique identifier in 128 bit space as well as a logical mapping to a specific node/disk/block/ sector/offset or equivalent location where the data element resides. Seg- mentation or sharding in the classic sense is handled quite differently since all AtomicDB systems can be be configured to inter-relate with one another, since every instance is compatible with every other instance by design. FAQs
  • 6. Atomc DB How do we make this work for large disparate datasets that may not be cleanly linked? How does AtomicDB associate data that was originally collected/ingested without any requirement at that time that they be linked in any way but may represent the same or as- sociated objects. Any field from any ingested data set can be post merged with any field from any other data set, and auto-data-merge / de-duplication / unifica- tion / correlation will occur. This function is actually a primitive. I imagine a scenario where 2 sets of data may not associate directly but indirectly through a ‘third party’ data set? Describe how AtomicDB might determine there is link between first 2 sets. If the third party data set is also ingested and there are corresponding da- ta fields, by including the ‘third party’ data set into the Model where the two data sets reside, it will auto correlate, unify the appropriate fields and de-duplicate the data. How does AtomicDB handle continuous numeric data… does each val- ue get its own data node or is data binned? We have numerical data potentially spanning vast numerical scales. Describe the bin- ning/discretization algorithm if there is any? The best way to handle data streams will depend on the intended use. Most often what matters is thresholds and patterns that evolve in or are derived from the data and since it is usually based on some temporal ag- gregation it is important to be able to process on a defined temporal gran- ularity that may vary from usecase to usecase. Feature sets are just patterns in relationships to AtomicDB and entities or events with similar features can be easily accessed using a reflexive association function com- posed of two GET’s. FAQs
  • 7. Atomc DB Do we lose any functionality at all in going from SQL to AtomicDB (grouping, aggregating, date typing etc) No. We often need to classify, cluster objects in sense of machine learning (both supervised, unsupervised) as well as select, extract, reduce dimensions to only relevant ones as in PCA for example. Can you describe how AtomicDB makes this job easier? Meta management is fully integrated into the AtomicDB system. Classifi- cation, categorization, grouping and clustering is as simple as adding asso- ciations to any set of items. Dimension reduction is totally unnecessary because all data elements are fully contextualized and can be referenced selectively without any need for extraction. Add all fields of interest to a Model, (ostensibly a view) Select target fields by clicking on them in a win- dow, Select filter criteria by clicking on them in a window. Push Get button. Review results. No programmers, data specialists, data scientists, db specialists needed. Do we need to understand more about the ETL tool itself? With only a “GET” function to retrieve data, it would appear that the ingest side is responsible for the adds/drops/updates The tool we have is an EL tool. Transformation is usually needed only when trying to map extracted data sets to a different (usually incompati- ble) structure (such a data warehouse or new db). Since AtomicDB was designed to simply accommodate any existing data structure, we don’t need to transform it for those reasons. We might want to transform a da- ta set because it was really badly designed or poorly implemented, such as having columns which should be items, but that would be done with a mapping in a preprocessor. The API also has IMPORT, ADD, MODIFY and ASSOCIATE functions. FAQs
  • 8. Atomc DB Couldn’t quite see how you would do a range query using “GET” sub-directives within the API Or a sort… sub-directives within the API Data appeared already normalised and fairly clean, which we know is sometimes an issues. Manually editing on the GUI probably wouldn’t catch all needs. Is there anything special that would help here Almost always an issue. I have rarely seen ‘clean’ data, except from except from certain 3 letter agencies after redaction. In terms of pre-built cleansers, we find it easier to write a quick parser that bins the data into Known Good, Questionable, and Somethings Wrong Here bins. Because data items are unified, de-duped and contextu- alized, writing custom cleaner algorithms for pre or post-processing are relatively trivial. If you don’t have in-house expertise, we can provide as needed. Unstructured textual data? Yes it is. We have written app-level parsers that identify all potentially subject indi- cating terms and produce a semi-structured representation (in AtomicDB tokens) of the document with bidirectional associations done on a head- ing, sentence, paragraph and section (chapter) basis. From that, feature sets, subject derivation and auto-similarity mapping can be done. We can also intergate the Semantic Parser of your choice. 20) Performance as well is a concern for me, both an query time and also ingest. Me to. We are always optimizing efficiency and performance to remain competitive. FAQs
  • 9. Contact Info Jean Michel LeTennier jm@atomicdb.net John Carroll john@atomicdb.net Dr Phil Templeton ptempleton@atomicdb.net http://www.atomicdb.net