Enabling the Computational Future of Biology.pdf

Tomás Sabat Stöfsel, COO
Enabling the Computational Future of Biology

360 real time patient views In silico clinical trials

Hyper-personalised medicine

Hyper-personalised medicine De novo drug design

Cell and gene therapy

Cell and gene therapy Ageing research

Immunotherapy

Immunotherapy mRNA technology

The future of biology is computational

Nearly 90% of PIs said they are or
will soon be working with large
data sets

What is computational biology?

Is there any area of biology that doesn’t involve
computation?

What data are we talking about?

Public Data
Structured Data
Internal Data
Legacy data
Lab data

Unstructured Data (academic papers, etc)
Structured Data
Public Data Internal Data
Legacy data
Lab data

Structured Data
Unstructured Data (academic papers, etc)
Public Data Internal Data
Legacy data
Lab data

Pathway Cell type
Protein
Transcript Gene
Disease Drug
Phenotype

Variant Drug
Tissue
Phenotype
Sample
Protein
Trial Receptor
Cell
Chromosome
Virus
Disease
Gene
SNP
Transcript
Bacteria
Cell-type
Compound
Protein
Organism
Pathway
GPCR
Kinase
Protein complex
Drug-class
Publication
Author
Organisation
Journal
Metabolite

Variant Drug
Tissue
Phenotype
Sample
Protein
Trial Receptor
Cell
Chromosome
Virus
Disease
Gene
SNP
Transcript
Bacteria
Cell-type
Compound
Protein
Organism
Pathway
GPCR
Kinase
Protein complex
Drug-class
Publication
Author
Organisation
Journal
Metabolite
TOO COMPLEX

It’s very hard to model and integrate heterogeneous datasets

Why is it so hard to model biomedical data?

Choose the major entities
Data Modelling
protein drug

protein
Identify the relationship types
drug
interaction
Data Modelling

Determine which attributes belong to which entities
protein
uniprot-id
drug
chembl-id
interaction
Data Modelling

Normalise
protein
uniprot-id
drug
chembl-id
interaction
Data Modelling

SSN Pnumber Hours Ename Pname Plocation
FD1
FD2
FD3
SSN Pnumber Hours
FD1
SSN Ename
FD2
Pnumber Pname Plocation
FD3
2NF Normalisation
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn
3NF Normalisation
Ename Ssn Bdate Address Dnumber Dnumber Dname Dmgr_ssn

Normalise
X
protein
uniprot-id
drug
chembl-id
interaction
Data Modelling

protein
uniprot-id
drug
chembl-id
interaction
Data Modelling

protein
uniprot-id
drug
chembl-id
interaction
define
protein sub entity,
owns uniprot-id,
plays interaction:interacted;
drug sub entity,
owns chembl-id,
plays interaction:interacting;
interaction sub relation,
relates interacting,
relates interacted;
uniprot-id sub attribute, value string;
chembl-id sub attribute, value string;
No need to normalise our data!
Data Modelling

protein
uniprot-id
drug
chembl-id
kinase ion-channel
interaction
Data Modelling

drug
chembl-id
owns
protein
uniprot-id
drug
chembl-id
kinase ion-channel
interaction
define
protein sub entity,
owns uniprot-id,
plays interaction:interacted;
kinase sub protein;
ion-channel sub protein;
drug sub entity,
owns chembl-id,
plays interaction:interacting;
interaction sub relation,
relates interacting,
relates interacted;
uniprot-id sub attribute, value string;
chembl-id sub attribute, value string;
Data Modelling

drug
ion-channel
kinase
interaction
interacted
interacting
interacting
interacted
interaction
Return kinases and ion-channels connected to drugs

interaction
protein
interacting interacted
match
$drug isa drug, has chembl-id "CHEMBL1193654";
$protein isa protein;
(interacted: $protein, interacting: $drug) isa interaction;
get $protein;
drug
interacting interacted
Return kinases and ion-channels connected to drugs

gene
encode
protein
source
$gene isa gene;
$protein isa protein;
$source isa source;
(encoding: $gene, encoded: $protein,
sourced: $source);
Ternary Relations
encoding encoded
sourced

protein
ppi
protein
locates
tissue
interacting interacting
located locating
$protein1 isa protein;
$protein2 isa protein;
$ppi (interacting: $protein1,
interacting: $protein2) isa protein-
interaction;
$tissue isa tissue;
(located: $ppi, locating: $tissue) isa
locates;
Nested Relations

protein disease
gene
encode assoc
assoc
associating
encoding associating
associated
associated
encoded
rule gene-protein-disease:
when {
(encoding: $x, encoded: $y) isa encode;
(associating: $y, associated: $z) isa
protein-disease-association;
} then {
(associating: $x, associated: $z) isa
gene-disease-association;
};
Rule: Infer gene to disease associations

Why do these constructs matter?

Because it makes your lives so much easier!

We can finally model the real world as we perceive reality

Drug Discovery
Data Harmonisation
Precision Medicine
Competitive Intelligence
Precision Medicine
Supply Chain Optimisation
Clinical Trial
Cohort Selection
Disease Understanding

Who have run clinical trials on Ebola who also own patents?
What are the most likely gene targets for Melanoma?
Given someone’s biological and genetic profile, what clinical trials are they
eligible for?
Questions we can ask

Architecture
Public Data
Unstructured data
Legacy data
Structured Data
Lab data
Internal Data

Architecture
Connectors
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data

Architecture
TypeDB
Loader
Custom
Loaders
Connectors
…
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data

Architecture
Client Drivers
(Python, Java,
NodeJS, etc)
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data
TypeDB
Loader
Custom
Loaders
Connectors
…

Architecture
Text Mining
coreNLP
…
Client Drivers
(Python, Java,
NodeJS, etc)
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data
TypeDB
Loader
Custom
Loaders
Connectors
…

Text Mining
coreNLP
…
Dashboards
Analytics
Chatbots
…
Applications
Client Drivers
(Python, Java,
NodeJS, etc)
Architecture
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data
TypeDB
Loader
Custom
Loaders
Connectors
…

Can we automatically map SQL, RDF or JSON to a TypeDB schema?

Can we automatically map SQL, RDF or JSON to a TypeDB schema?
We could, but we shouldn’t.

For example, if it’s already normalised in 3NF, we won’t benefit from TypeDB’s schema.

Genes
disease-name mesh-id ensembl-id …
… … … …
Diseases
ensembl-id symbol mesh-id …
… … … …
Loading tabular data

Genes
… … … …
Diseases
… … … …
gene
symbol
owns
ensembl-id
owns

Genes
… … … …
Diseases
… … … …
gene
symbol
owns
ensembl-id
owns
disease
mesh-id
owns
name
owns

Genes
… … … …
Diseases
… … … …
associated-gene
gene-disease-
association
gene
symbol
owns
ensembl-id
owns
associated-disease
disease
mesh-id
owns
name
owns

disease
person
person
Loading JSON

disease
person
person
publication
authorship
authorship
Loading JSON

disease
person
person
publication
authorship
publishing journal
Loading JSON
authorship

person mention
publication disease
authorship
person
publishing journal
end
start
Loading JSON
authorship

Drug Discovery
Precision Medicine
eligible for?

Drug Discovery
Precision Medicine

Competitive Intelligence Drug Discovery Precision Medicine

Public Data
Structured Data
Molecular
Disease
…

Public Data
Unstructured data
Structured Data
Molecular
Clinical Trials
Patents
Disease
…

Public Data
Unstructured data
Structured Data
TypeDB
Loader
Custom
Loaders
Connectors
…
Molecular
Clinical Trials
Patents
Disease
…

Text Mining
coreNLP
…
Public Data
Unstructured data
Structured Data
TypeDB
Loader
Custom
Loaders
Connectors
…
Molecular
Clinical Trials
Patents
Disease
…

Client Drivers
(Python, Java,
NodeJS, etc)
Text Mining
coreNLP
…
Public Data
Unstructured data
Structured Data
TypeDB
Loader
Custom
Loaders
Connectors
…
Molecular
Clinical Trials
Patents
Disease
…

Client Drivers
(Python, Java,
NodeJS, etc)
Competitive
Insights
Output
Text Mining
coreNLP
…
Public Data
Unstructured data
Structured Data
TypeDB
Loader
Custom
Loaders
Connectors
…
Molecular
Clinical Trials
Patents
Disease
…

person

patent person
clinical-trial
disease
name: “Ebola”

patent person
clinical-trial investigation
disease
name: “Ebola”
investigator
investigated

patent person
study disease
name: “Ebola”
investigator
studied
investigated
studying

patent
ownership
person
study disease
name: “Ebola”
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
$trial isa clinical-trial;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
$disease isa disease;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
$disease isa disease, has name "Ebola";
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
(owner: $person, owned: $patent) isa ownership;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
(investigator: $person, investigated: $trial) isa
investigation;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
investigation;
(studying: $trial, studied: $disease) isa study;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
name: “Ebola”
match
$person isa person;
$patent isa patent;
investigation;
get $person;
investigator
investigated
studying
studied
owned owner

patent
ownership
person
study disease
match
$person isa person, has name $name;
$patent isa patent;
investigation;
get $name;
name: “Ebola”
investigator
investigated
studying
studied
owned owner

Drug Discovery
Precision Medicine

Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data

Client Drivers
(Python, Java,
NodeJS, etc)
Text Mining
coreNLP
…
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data
TypeDB
Loader
Custom
Loaders
Connectors
…

Client Drivers
(Python, Java,
NodeJS, etc) KGCN
Text Mining
coreNLP
…
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data
TypeDB
Loader
Custom
Loaders
Connectors
…

Query Result, a subgraph
Graph
Learning
Algorithm
Learner
TypeQL Query Subgraph Predictions
match
$p isa protein;
$d isa disease, has disease-
group ”Cancer", has disease-id
$did;
$t isa tissue;
$g isa gene, has gene-id $gid;
($p, $t);
($t, $d);
($g, $t);

Client Drivers
(Python, Java,
NodeJS, etc) KGCN
List of targets
Output
Text Mining
coreNLP
…
Public Data
Unstructured data
Structured Data
Legacy data
Lab data
Internal Data
TypeDB
Loader
Custom
Loaders
Connectors
…

> match $g isa gene, has gene-id $gid;
$d isa disease, has disease-name ”melanoma";

($g, $d) isa gene-disease-association, has kgcn-prob $p;

get $gid; sort desc $p;

get $gid; sort desc $p;
{$gid "DDXIIL1" isa gene-id;}
{$gid "WASH7P" isa gene-id;}
{$gid "MIR1302-10" isa gene-id;}
{$gid "MIR1302-11" isa gene-id;}
{$gid "OR4F5" isa gene-id;}
{$gid "FAM138D" isa gene-id;}
{$gid "FAM41C" isa gene-id;}
{$gid "NOC2L" isa gene-id;}
{$gid "HES4" isa gene-id;}
{$gid "RNF223" isa gene-id;}
{$gid "TNFRSF4" isa gene-id;}
...

Drug Discovery
Precision Medicine
Given someone’s biological and genetic profile, what
clinical trials are they eligible for?

Public Data
Unstructured data
Structured Data
Molecular
Clinical Trials
…
Precision DBs
…

Text Mining
coreNLP
…
TypeDB
Loader
Custom
Loaders
Connectors
…
Public Data
Unstructured data
Structured Data
Molecular
Clinical Trials
…
Precision DBs
…

Client Drivers
(Python, Java,
NodeJS, etc)
Text Mining
coreNLP
…
TypeDB
Loader
Custom
Loaders
Connectors
…
Public Data
Unstructured data
Structured Data
Molecular
Clinical Trials
…
Precision DBs
…

Client Drivers
(Python, Java,
NodeJS, etc)
Personalised-
therapies
Output
Text Mining
coreNLP
…
TypeDB
Loader
Custom
Loaders
Connectors
…
Public Data
Unstructured data
Structured Data
Molecular
Clinical Trials
…
Precision DBs
…

Client Drivers
(Python, Java,
NodeJS, etc)
Personalised-
therapies
Output
eligible for?
Text Mining
coreNLP
…
TypeDB
Loader
Custom
Loaders
Connectors
…
Public Data
Unstructured data
Structured Data
Molecular
Clinical Trials
…
Precision DBs
…

trial
personalised-
therapy
person
match
$person isa person, has name "Alice";
$trial isa clinical-trial, has nct-id $nct;
($person, $trial) isa personalised-therapy;
get $nct;
eligible for?

eligible for?
Relevance for a clinical trial Eligibility for a clinical trial
Patient has the same gene and variant
mentioned in the clinical trial
Patient is within the right age bracket and
gender for the trial

trial
personalised-
therapy
person
eligible for?

trial
personalised-
therapy
relevant-
trial
person
eligible for?

trial
personalised-
therapy
eligible-trial
relevant-
trial
person
eligible for?

trial
personalised-
therapy
eligible-trial
relevant-
trial
rule personalised-patient-therapy:
when {
($person, $trial) isa eligible-trial-participant;
($person, $trial) isa relevant-trial-participant;
} then {
($person, $trial) isa personalised-therapy;
};
eligible for?

eligible for?
Relevance for a clinical trial
Patient has the same gene and variant
mentioned in the clinical trial
relevant-
trial

trial person
relevant-
trial

trial person
variant assoc
mention
relevant-
trial

trial person
gene assoc
variant assoc
mention
mention
relevant-
trial

trial person
gene
symbol: $gs
assoc
variant assoc
symbol: $vs
mention
mention
relevant-
trial
rule trial-participant-relevance:
when {
$person isa person;
$gene isa gene;
$variant isa variant;
($person, $gene);
($person, $variant);
($trial, $gene);
($trial, $variant);
} then {
($person, $trial) isa relevant-trial;
};

eligible for?
Eligibility for a clinical trial
Patient is within the right age bracket and
gender for the trial
eligible-trial

trial person
eligible-trial

trial person
disease assoc
assoc
eligible-trial

trial person
disease assoc
assoc
max-age
gender
eligible-trial
age
min-age
greater than
less than

trial person
disease assoc
assoc
eligible-trial
max-age
gender
age
min-age
greater than
less than
rule trial-participant-eligibility:
when {
$person isa person, has age $age, has gender $gender;
$trial isa clinical-trial,
has min-age <= $age,
has max-age >= $age,
has gender = $gender;
$disease isa disease;
($disease, $person);
($disease, $trial);
} then {
($person, $trial) isa eligible-trial;
};

TypeDB Bio
github.com/typedb-osi/typedb-bio

Enabling the Computational Future of Biology.pdf

Recommended

Recommended

More Related Content

Similar to Enabling the Computational Future of Biology.pdf

Similar to Enabling the Computational Future of Biology.pdf (20)

More from Vaticle

More from Vaticle (20)

Recently uploaded

Recently uploaded (20)

Enabling the Computational Future of Biology.pdf