Reasons for interest in Graph DB: Exploration of new content delivery formats and tools for content consumption
Use of graph DB as store for all glossaries and organisation data stores metadata and their relationships
Common independent interface for all services, apps and users
Flexible schema (does not need to be predefined), easily extensible
Consumable – easy access to the data via common API/interface
Visualisation – a business user friendly tool for data exploration, navigation, search and analysis
2. 2
Reasons for interest in Graph DB: Exploration of new content delivery formats and tools for
content consumption
Use of graph DB as store for all glossaries and organisation data stores
metadata and their relationships
Common independent interface for all services, apps and users
Flexible schema (does not need to be predefined), easily extensible
Consumable – easy access to the data via common API/interface
Visualisation – a business user friendly tool for data exploration, navigation,
search and analysis
Easy integration of Industry models content (or derived from) with customer
collection of glossaries, vocabularies, ontologies and other models
Not seen as replacement of the main formats & content
authoring/management tools, rather complementary as a “read only
platform” mainly beneficial for business users and analysts
Any current work is and future work will stay flexible & compatible with main
software tools: “Titan/Janusgraph with Tinkerpop”, “IBM Graph”, Neo4j
3. 3
Industry Model Common Components
Data Models
Vocabulary
Atomic
Warehouse
Model
Dimensional
Warehouse
Model
Business
Data Model
Business
Terms
Analytical
Requirements
Supportive
Content
Industry Models
Industry concepts in plain business language and with no
modeling. Business Terms are organized by Business
Categories. The mapping to the data models allows the
transformation of requirements into IT data structures.
Business Terms
High level groups of business information to express
business Measures along axes of analysis, which are
named Dimensions.
Analytical Requirements
Grouping of terms incorporating any terminology
originating from an internal or external source. It is used to
support data structures such as regulatory reports,
industry standards, business architecture standards,
vendor interfaces, or legacy source systems.
Supportive Content
Highly normalized conceptual data
model that is an enterprise-wide,
generic, and flexible data epresentation
of informational systems.
Business Data Model
A normalized design level data
model representing the
repository of atomic data used
for informational processing.
Atomic Model
A design level dimensional model
representing the repository of analytical
data. It contains star schemas
supporting the Analytical Requirements
Dimensional Model
4. 4
Tools Currently Used & Supported
Data Modelling tools:
• Infosphere Data Architect
• Erwin Data Modeler
Data Governance tools:
• Infosphere Governance Catalog
(Business Glossaries, Models
and other metadata)
5. 5
Graph Databases & Graph Data Visualisation tools
Graph Databases
• Neo4j Community Edition (GPL v3 license)
• Neo4j Enterprise Edition (Evaluation & Commercial License)
• Titan (Apache License 2.0)
• Janusgraph (Apache 2 License & Creative Commons Attribution 4.0 International)
• IBM Graph managed service on Bluemix (build on Titan/Tinkeprop stack)
Graph computing framework
• Tinkerpop (Apache License 2.0) embedded in Titan/Janusgraph but can be use as
standalone GraphDB too for demos and small projects using in-memory TinkerGraph
Graph data visualisation
• Neo4j Browser (same licensing as above) - part of the Neo4j Graph DB
• Linkurious Enterprise (commercial license)
vis.js javascript visualisation library (Apache 2.0 and MIT)
11. 11
Content Transformation
IM artifact formats
• IGC export in XML
• Logical Models LDM files are XML files
Transformation is split into two steps (fully working script prototype in Powershell)
This allows to bring any customer’s data into the mix in easy to understand format: collection of
CSV files in two folders – nodes and edges – each file represents different node/edge type with
any properties as CSV columns (each node type can have different set of properties/columns) –
the only mandatory fields are ID and name
1. XML (LDM & IGC) to CSV transform
2. CSV to GRAPHML transform
• also produces Graph DB schema model in JSON format (for IBM Graph on Bluemix)
• also produces groovy script for schema creation for Titan/Janusgraph
GRAPHML graph data format – includes schema and all node and edge data
• a format importable to Titan/JanusGraph using the Gremlin/Tinkerpop console
• Tinkeprop/Gramlin can be also used to import to Neo4j
• Format recognized and supported also by IBM graph (although currently with limitation –
size of file cannot be over 10MB)
12. 12
Meaning of Node types in Graph DB
Logical model based types
• entity
• attribute
• package
• model
• diagram
Physical model based types
• column
• Table
Glossary based types
• term
• category
13. 13
Meaning of Edge types in Graph DB
Assigns:
• term assignation to asset: attribute/column/entity/table
Belongs:
• describes parent object = ownership
• entity/table to package
• package to package
• package to model
• term to category
• category to category
• category to glossary
• diagram to package
Describes:
• attribute describes entity
• column describes table
• term isOf another term
Maps:
• attribute to attribute calculation
• attribute to attribute/entity population in the same model or AWM-
>DWM
• column to column/table population in the same schema or AWM-
>DWM
• note: covers both population and calculation dependencies
• design transformation dependency
• attribute to attribute/entity (from another model e.g. BDM->AWM or
BDM->DWM)
• table to entity
• column to attribute
References:
• term to category: can be referenced by any number of categories (in
addition to owning category)
• entity/table to diagram: can be referenced by any number of diagrams
Relates:
• entity to entity relationship in ER models
• table to table relationship in ER models
• term to term relatedTerm
Subtype:
• entity is subtype of another entity generalization/inheritance in ER
models
• term isTypeOf another term
Synonym:
• one term is synonym of another - in a directed graph there is a direction
of this edge suggesting master-child
14. 14
Common attribute across all edge & node types
Taxonomy:
• Logical Business Data Model
• Logical Atomic Warehouse Model
• Logical Dimensional Warehouse
Model
• Business Glossary
• Analytical Requirements
• Supportive Content
• Scopes
• Physical Dimensional Warehouse
Model
• Physical Business Data Model
• Physical Atomic Warehouse Model
Taxonomy Type:
• logicalModel
• physicalModel
• glossary
Industry:
• banking
• insurance
• healthcare
• Utilities
Version