Join the TypeDB community to learn how we think about data modelling, and how TypeDB's expressivity allows you to model your domain based on logical and object-oriented programming principles.
Good for:
- Engineers, scientists, and technical executives
- Those in a technical field working with complex datasets, and building intelligent systems
- Anyone curious to learn about the expressive power of TypeDB's data model
Description:
We open this training with an exploration into what a schema looks like in TypeDB, starting with clarifying the motivation for the conceptual model in TypeDB, and its relationship to the Enhanced Entity-Relationship model.
Then we break things down a bit more philosophically, delving into: what does it mean to represent data in TypeDB, and how TypeDB allows you to think higher-level, as opposed to join-tables, columns, documents, vertices, edges, and properties.
Takeaways:
- Be able to articulate why TypeDB's data model is so beneficial for complex data, and why we use it to build intelligent systems
- Write a TypeDB schema in TypeQL
- Practice modelling one of your own domains
Tomás Sabat:
Tomás is the Chief Operating Officer at Vaticle, dedicated to building a strongly-typed database for intelligent systems. He works directly with TypeDB's open source and enterprise users so they can fulfil their potential with TypeDB and change the world. He focuses mainly in life sciences, cyber security, finance and robotics.
2. Modelling Principles
• The modelling philosophy, principles and techniques of TypeDB
• Best practices for designing your domain model
3. TypeDB allows you to model your domain based on logical and object-oriented
principles. Composed of entity, relationship, and attribute types, as well as type
hierarchies, roles, and rules.
4. Domain modeling – best practices
Here we set out the best practice guidelines for creating a TypeDB
schema. It is possible to create a successful model that does not conform
to the guidelines.
These guidelines aim to help maximise:
• true-to-domain modelling
• flexibility
• and extensibility
5. Importance of modeling choices
We want to carefully choose our data model, that is, choosing what should be an entity, a relation,
or an attribute.
We want this model to closely reflect the domain. In this way we can know that if new data
becomes available in the domain it will fit into the data model.
With this schema, when later we find
there is information regarding a
person’s employment, how can we add
it?
If the domain is modelled true-to-life,
then extra schema (and therefore data)
is added trivially.
person company
document
owns
employment
end-date
start-date
employee employer
owns
contract
person company
employee
X
✓
6. Importance of modeling choices
Naming and choosing your entities, relations and attributes are two issues that
are tied together.
The naming should closely link to typical domain terminology, since natural
language is the means to describe the domain. Using the expected domain
terminology is important, this lets another others working with you to easily:
• understand your model
• maintain it
• extend it
• adapt it.
Focussing on terminology to determine
structure is a typical approach for
software architecture in general, not just
for TypeDB schema
7. Naming - formatting
• naming is in all lower case
• use hyphens
• indent on newlines after declaring a type e.g.
define
person sub entity,
owns full-name;
8. Naming - Types
A type name should:
• ideally be a noun (except for attributes)
• be singular, the presence of more than one instance lends to plurality
• be as context-specific as possible, using the exact word that describes the
concept in-context
• if you can’t find a noun specific enough then try concatenating two or more
nouns by hyphenating them:
• these can quickly become verbose, so try and find a balance between
specificity and verbosity.
define
location-hierarchy sub relation,
relates subordinate,
relates superior;
9. Type names
Continued…
• if a noun can’t be found, then use a present participle or past-participle of a
verb as an adjective, to capture the context. Combine this with a context non-
specific noun.
• e.g. below we use authored-content, since content would be too generic,
and authored-book would be too specific (as it would rule out other types
of content):
• also consider using prepositions i.e.: to, from, by, of
authorship sub relation,
relates author,
relates authored-content;
10. Let’s iterate on our schema
Now that we’ve talked through some of the ways to name things, how would
you iterate on your schema?
Exercise- 6 minutes individually:
1) Review current naming choices.
2) Where would you make changes – make them.
3) Be prepared so share at least one change you made and why.
11. Modelling decisions – entities
“An entity may be defined as a thing capable of an independent existence that
can be uniquely identified. An entity is an abstraction from the complexities of a
domain. When we speak of an entity, we normally speak of some aspect of the
real world that can be distinguished from other aspects of the real world.”
12. Modelling decisions – entities
Good choices for entities are:
• Any physical thing in the real world should be modelled as an entity (e.g. animal,
person, device, building)
• Anything that exists logically but doesn’t require involvement of other things in
order to exist: organisation, school
• Use concrete/proper/common/abstract/collective nouns:
• concrete – person, tree, car
• proper – homo sapiens (sometimes the attribute name of an instance)
• abstract – religion, pain, principle
• collective – family, government, team, orchestra, set
• To specialise a general noun, use in combination with another noun – social group
13. Modelling decisions – relations
It is easy to say for certain that a concept is definitely an entity, e.g. car.
It is harder, the more conceptual a thing is, to decide whether it should be an entity or
a relation.
One way to decipher this, is to start by examining the potential relation as a binary
relation – relations that must either have the property or anti-property for each of
the following cases
Symmetric – relation is the same in both directions
Transitive– relation can be chained
Reflexive – role-player can be related to itself through the relation
14. Modelling decisions – relations
Example
An employment relation, that relates employee and employer role-players, is anti-
symmetric, anti-transitive, and anti-reflexive.
Therefore we can logically call this type a relation.
Example
Religion as a relation however, is not symmetric or anti-symmetric, transitive or anti-
transitive, nor reflexive or anti-reflexive.
Therefore we can conclude that a religion, is not binary and either an entity or a
ternary/n-ary relation.
15. Modelling decisions – relations
In general, relations shouldn’t make sense without their roles. For example, a
marriage can’t logically exist without at least one spouse/husband/wife.
Ideally, we are looking for the concept that connects two things, not a direct
connection (often those are the role names, like employee).
• for instance, below we can see that we don’t use “owns” as the type-name, but
instead we use the verbal noun of owns: ownership
ownership sub relation,
relates owner,
relates property;
16. Modelling decisions – relations
Gather together domain terminology that sounds similar to the concept you want to
model. Then determine which are candidate relation, role, and entity names
(determining and naming attributes is often not too hard).
Remember that a role describes how a thing behaves in the scope of a relation.
Examples with role-player my-schema-type, role, relation :
• a car behaves like property in an ownership
• a station behaves as a stop along a train-route
• a person behaves as an employee in an employment
17. Modelling decisions – relations
Relation names could describe membership to a grouping/collection of things
(component, group-membership), an action/ongoing state (marriage, comparison,
authorship, participation), or a description of a direct interrelation between two or
more things (friendship, parenthood, association, drug-protein-interaction)
group-membership sub relation,
relates member,
relates group;
drug-protein-interaction sub relation,
relates inhibitor,
relates antagonist
relates blocker,
relates target-protein;
18. Modelling decisions – relations
A relation is defined such that an instance should not be able to exist without
relating at least one instance for one of its roles. This is the idea that a relation is
dependent upon the existence of one or more role-players.
• A relation should still make sense even if any number of it’s role-players are
missing
• The roles and role-players should make logical sense to be connected in any
combination
19. Modelling decisions – relations
You should find that you choose names from these categories:
• abstract nouns
• transitive verbs that can accept 2 or more arguments – decide, agree, marry
• their verbal nouns are preferable – decision, agreement, marriage
20. Modelling decisions – roles
Nouns, verbs, prepositions
location-of sub relation,
abstract
relates located,
relates location;
direction sub relation,
relates from,
relates to;
ownership sub relation,
relates owner,
relates property;
21. Modelling decisions – relations
Wherever possible, relations should be named in such a way that the name doesn’t
include a ‘reference’ to one of its role-players in particular.
Parenthood is an example of a fairly unavoidable case, where the relation naming
refers more to the role of parent than the role of child .
An example of the ideal case would be:
hierarchy sub relation,
relates superior,
relates subordinate;
22. Let’s iterate on our schema
Discussion
Looking specifically at your relations, what would you change or what do you
imagine needing to put more thought to?
23. Modelling decisions – attributes
Usually the easiest to identify, as they are the direct description of a set of values
that we want to model.
Naming an attribute:
• The name of an attribute should refer to a literal value.
• Make attributes context-specific – by concatenating words or ending with a
noun
• Abstract nouns e.g. colour
• Adjectives e.g. friendly
• Intransitive verbs (no direct objects, can’t be followed by “who” or “what”) e.g.
is-raining, graduated
24. Benefits of inheritance – scoping queries
Wide scope: get all the posts
post
status-update
sub
comment media
photo video
sub sub
sub sub
Intermediate scope: get all media
match $m isa media;
post sub entity,
status-update sub post;
comment sub post;
media sub post;
video sub media;
photo sub media;
match $p isa post;
match $c isa comment;
Narrow scope: get all comments
25. Benefits of inheritance – constraints
person
ownership
owner
owned
However, we can see a problem. Based on this schema, a person
can own an office and a company can own a social-group.
company
office
social-group
owner
owned
ownership sub relation,
relates owner,
relates owned;
company sub entity,
plays ownership:owner;
office sub entity,
plays ownership:owned;
social-group sub entity,
plays ownership:owner;
person sub entity,
plays ownership:owned;
26. Benefits of inheritance – constraints
person
ownership
social-group
owned-group
group-ownership office-ownership
group-owner
company office
owned-office
office-owner
owned
ownership sub relation,
abstract,
relates owner,
relates owned;
office-ownership sub ownership,
relates office-owner as owner,
relates owned-office as owner;
owned
27. Composition vs. inheritance
Composition replaces the temptation of multiple inheritance.
“Entity type Y is a subtype (subclass) of an entity type X if and only if every Y is necessarily an X”
(https:/en.wikipedia.org/wiki/Enhanced_entity%C3%A2%C2%80%C2%93relationship_model)
Therefore define customer sub person; is a bad idea, since:
• An organisation could be a customer, therefore customer is a behaviour (a role)
• A person who plays the role of a customer could play the role of many other things, e.g. teacher
Mindset Shift
View using roles as composition for behaviours of a concept
Try putting names in a context like this:
A [relation] has a [role] in the form of a [thing]
28. Ternary and n-ary Relations
To help us see the use of ternary relations, consider someone buying a product
vendor
product
owns
Start with only binary relations Ternary – since we have 3 role-players in one relation
transaction
product
company person
custom
er
customer
purchase
sale
product
company person
Where do we add value for the sale? Here, value, can be added trivially
value
29. Nested relations
vendor
product
owns
Now we can refer to the transaction in other
relations.
Note that this can be favourable over adding
another role to the existing relation.
This is better for:
• Consistency across schema
• Versatility, we can add more information to
either of the two relations
transaction
product
company person
customer
value
place
located
location
location-of-
transaction
30. Optimisation
Schema design impacts query performance.
Use context-specific relation and role names, this allows the query planner to
find a good path (otherwise all data is homogeneous, it all looks the same).