The beginning of CovidGraph in March2020
We build a knowledge graph on COVID-19 that integrates various public datasets.
We structure data - connect data
We connect entities from biomedical field such as genes, proteins, molecular pathways
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
Cope not only with literature, but also genes,
proteins, diseases,…
Covid19 and
Diabetes?
100k genomes - cool
tell me about the
results
AD?
Didn’t know about
this study…
P881208…?
cryo-EM structure of the complete and ligand-
saturated receptor ectodomain
128’000 texts on only Covid-19 in the 6 month
has synonyms
has synonyms
t
r
a
n
s
c
r
i
b
e
d
t
r
a
n
s
l
a
t
e
d
mentioned mentioned
m
entioned
in
is
i
s
b
i
o
m
a
r
k
e
r
is
Covid19
Protein
Y
Gene
X
Diabetes
Disease
Disease
HbA1c
encodes
Metabolite
M
Transcript
Z
SNP
K
Publication
1
Publication
2
Publication
31
Publication
5430
Patent
WOxxx
Patent
WOxxx
Patent
WOxxx
Data sources and numbers
A
U
T
H
O
R
_
H
A
S
_
A
F
F
I
L
I
A
T
I
O
N
AUTHORCOLLECTION_HAS_AUT…
B
O
D
Y
T
E
X
T
C
O
L
L
E
C
T
I
O
N
_
H
A
S
…
ABSTRACTCOLLECTION_HAS_ABSTRA…
R
E
F
E
R
E
N
C
E
C
O
L
L
E
C
T
IO
N
_
H
A
S
_
R
E
…
BODYTEXT_HAS_CITATION
PAPER_HAS_REFERENCECOLLECTI…
C
I
T
A
T
I
O
N
_
H
A
S
_
R
E
F
E
R
E
N
…
ABSTRACT_HAS_CITATION
AFFILIAT
ION_HAS
_LOCAT
ION
Abstract
Affiliation
Author
BodyText
Citation
Location
Paper
Reference
HAS_FRAGMENT
HAS_FRAGMENT
H
A
S
_
F
R
A
G
M
E
N
T
MENTIONS
HAS_FRAGMENT
HAS_FRAGMENT
H
A
S
_
F
R
A
G
M
E
N
T
Sentence
PATENT_HAS_PATENTABSTRACT
APPLICANT
PAT
ENT
_HA
S_P
ATE
NTN
UMB
ER
PATENT_HAS_PATENTTITLE
PATENT_HAS_PATENTDE…
P
A
T
E
N
T
_
H
A
S
_
P
A
T
E
N
T
C
L
A
I
M
Entity
Patent
PatentAbstract
PatentClaim
PatentDescripti…
PatentNumber
PatentTitle
SYNONYM
M
A
P
S
MAPS MAPS
MAPS CODES
CODES
Gene
GeneSymbol Protein
Transcript
ASSOCIAT
ION
PART_OF
M
E
M
B
E
R
GOTerm
Pathway
E
X
P
R
E
S
S
E
D
GtexTissue
CURRENT_TOTAL
REPORTED
LOCATED_IN LOCATED_IN
P
A
R
T
_
O
F
AgeGroup
City Country
DailyReport
Province
HAS_EXCLUSION_CRITERIA
IS_PHASE
CONDUCTED_AT
P
U
B
L
I
S
H
E
D
HAS_INCLUSION_CRITERIA
ClinicalTrial
ExclusionCriteria
Facility
InclusionCriteria
Phase
144’000
32’000
125’000
128’000
410’000
484’000
1700
55
21’000 47’000
30’000’000
Angiotensin-converting enzyme 2 GENE_OR_GENOME ( ACE2
GENE_OR_GENOME ) as a SARS-CoV-2 CORONAVIRUS receptor:
molecular mechanisms and potential therapeutic target. SARS-CoV-2
CORONAVIRUS has been sequenced [3 CARDINAL]. A phylogenetic
analysis [3 CARDINAL, 4 CARDINAL] found a bat WILDLIFE origin for
the SARS-CoV-2 CORONAVIRUS. There is a diversity of possible
intermediate hosts for SARS-CoV-2 CORONAVIRUS, including
pangolins WILDLIFE, but not mice EUKARYOTE and rats EUKARYOTE
[5 CARDINAL].
There are many similarities of SARS-CoV-2 CORONAVIRUS with the
original SARS-CoV CORONAVIRUS. Using computer modeling, Xu et al.
[6 CARDINAL] found that the spike proteins GENE_OR_GENOME of
SARS-CoV-2 CORONAVIRUS and SARS-CoV CORONAVIRUS have
almost identical 3-D structures in the receptor-binding domain that
maintains van der Waals forces PHYSICAL_SCIENCE. SARS-CoV
CORONAVIRUS spike protein has a strong binding affinity to human
ACE2 GENE_OR_GENOME, based on biochemical interaction studies
and crystal structure analysis [7 CARDINAL]. SARS-CoV-2
CORONAVIRUS and SARS-CoV spike proteins GENE_OR_GENOME
share 76.5% identity in amino acid sequences
NLP - we transform text into knowledge
How you can access it
https://db.covidgraph.org/semspect
https://db.covidgraph.org/browser
https://db.covidgraph.org/browser/bloom
https://live.yworks.com/covidgraph
GDS library - page rank - find the most relevant gene
finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell
Interactive
Scalable
Semi-natural language
query
Especially for
non-computer scientists