CovidOnTheWeb
~15 persons from the Wimmics Team
[Fabien Gandon et al.]
@fé-in
1
2
vs. use cases…
• Scenario 1: Help clinicians analyze clinical trials and take evidence-based decisions
• Scenario 3: Help missions heads from Cancer Institute elaborate research programs
to study the links between cancer and coronavirus
3
[Giboin, et al.]
4
[Tim Berners-Lee et al., 1994]
4
identify what
exists on the
web
http://my-site.fr
5
identify what
exists on the
web
http://my-site.fr
identify,
on the web, what
exists
http://animals.org/this-zebra
6
URIs for…
• URI for Paris in DBpedia:
http://dbpedia.org/resource/Paris
• URI for name of Victor Hugo in the Library of Congress:
http://id.loc.gov/authorities/names/n79091479
• The MUC18 protein at UniProt
http://www.uniprot.org/uniprot/P43121
• Xavier Dolan in Wikidata
http://www.wikidata.org/entity/Q551861
• The book with doi:10.1007/3-540-45741-0_18
http://dx.doi.org/10.1007/3-540-45741-0_18
•
7
URIs for… things mentioned in papers
AI methods: NLP and IR for named entity annotation in text
 DBpedia Spotlight → DBpedia URIs
 Entity-fishing → Wikidata URIs
 BioPortal Annotator → BioPortal URIs
[Gazzotti, et al.]
8
URIs for… things mentioned in papers
AI methods: NLP and IR for named entity annotation in text
 DBpedia Spotlight → DBpedia URIs
 Entity-fishing → Wikidata URIs
 BioPortal Annotator → BioPortal URIs
“Effects on QT interval of hydroxychloroquine
associated with ritonavir/darunavir or azithromycin
in patients with SARS-CoV-2 infection” [Danzi et al.]
http://dbpedia.org/resource/Severe_acute_respiratory
_syndrome_coronavirus_2
http://dbpedia.org/resource/Hydroxychloroquine
http://dbpedia.org/resource/Azithromycin
http://dbpedia.org/resource/Ritonavir
http://dbpedia.org/resource/Darunavir
http://dbpedia.org/resource/Heart_arrhythmia
[Gazzotti, et al.]
9
Extracting arguments and PICO elements
AI methods: BERT+SciBERT, LSTM, Conditional Random Field
[Mayer, Cabrio, Villata, Marro, et al.]
Evidence-Based Practice (EBP)
•Patient Problem, (or Population)
•Intervention,
•Comparison or Control, and
•Outcome
10
[ECAI 2020]
COVID ON THE WEB
RDF
translator
Named
Entities
extractors
Covid-on-the-Web
dataset
LOD
ACTA pipeline
extraction of argument graphs
1
1
2
2
1
vocabularies
& datasets
2
Covid-19
Open
Research
Dataset
Process data &
derive “smarter” data
[Michel, Gazzotti, Gandon, Cabrio, Villata, Mayer et al.]
11
[ISWC 2020, IC 2021]
ratatouille.fr
or the recipe for linked data
12
ratatouille.fr
or the recipe for linked data
13
ratatouille.fr
or the recipe for linked data
14
datatouille.fr
or the recipe for linked data
15
linked open data(sets) cloud on the Web in RDF
0
200
400
600
800
1000
1200
1400
01/05/2007 08/10/2007 07/11/2007 10/11/2007 28/02/2008 31/03/2008 18/09/2008 05/03/2009 27/03/2009 14/07/2009 22/09/2010 19/09/2011 30/08/2014 26/01/2017
number of linked open datasets on the Web
16
Integrate in RDF named entities
17
morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
Integrate in RDF
metadata
named entities
18
morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
Integrate in RDF
metadata
named entities
arguments
PICO elements
19
morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
Integrate in RDF
metadata
named entities
arguments
PICO elements
provenance
morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
20
Integrate in RDF
metadata
named entities
arguments
PICO elements
provenance
+ other data
sources
morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
21
Dataset description No. RDF triples
dataset description + definition of a few properties 170
articles metadata (title, authors, DOIs, journal etc.) 3 722 381
named entities identified by Entity-fishing in articles titles/abstracts 35 049 832
named entities identified by Entity-fishing in articles bodies 1 156 611 321
named entities identified by Bioportal Annotator in articles titles/abstracts 104 430 547
named entities identified by DBpedia Spotlight in articles titles/abstracts 65 359 664
argumentative components and PICO elements by ACTA from articles titles/abstracts 7 469 234
Total 1 361 451 364
22
23
COVID ON THE WEB
RDF
translator
Jupyter Notebook
Python, R & analytics
Corese
engine
query
&
infer
ACTA Web application
visualization of argument graphs
Corese portal
Data browsing
MGExplorer
Data visualization
Open Data publication
Zenodo, Github, Virtuoso
Named
Entities
extractors
Covid-on-the-Web
dataset
LOD
ACTA pipeline
extraction of argument graphs
1
1
2
2
1
3
3
3
vocabularies
& datasets
3
3
3
2
Covid-19
Open
Research
Dataset
Process data &
derive “smarter” data
Means to exploit data
[Corby, Michel, Gazzotti, Gandon et al.]
24
[ISWC 2020, IC 2021]
25
Cancer
GQ
Articles about cancer in the COVID dataset:
• "Antipsychotic treatment effects on cardiovascular, cancer, infection, and intentional self-harm
as cause of death in patients with Alzheimer's dementia"
• "Targeting cancer stem cell pathways for cancer therapy"
• "Deubiquitinases and cancer: A snapshot"
• "The Functional Properties of Preserved Eggs: From Anti-cancer and Anti-inflammatory
Aspects"
• "The functional role of the novel biomarker karyopherin α 2 (KPNA2) in cancer"
• "Biochemical characterisation of lectin from Indian hyacinth plant bulbs with potential
inhibitory action against human cancer cells"
• "Purification, identification and profiling of serum amyloid A proteins from sera of advanced-
stage cancer patients"
• "Darwin, medicine and cancer"
• "Outcome of Oncology Patients Infected With Coronavirus"
• "Review and Meta-Analyses of TAAR1 Expression in the Immune System and Cancers"
• "Experimental Data-Mining Analyses Reveal New Roles of Low-Intensity Ultrasound in
Differentiating Cell Death Regulatome in Cancer and Non-cancer Cells via Potential Modulation
of Chromatin Long-Range Interactions"
• "Golgi anti-apoptotic protein: a tale of camels, calcium, channels and cancer"
• "Severe novel influenza A (H1N1) infection in cancer patients"
• "Molecular Profiling of Multiple Human Cancers Defines an Inflammatory Cancer-Associated
Molecular Pattern and Uncovers KPNA2 as a Uniform Poor Prognostic Cancer Marker"
• "SARS-CoV-2 transmission in cancer patients of a tertiary hospital in Wuhan"
• "Creosote bush lignans for human disease treatment and prevention: Perspectives on
combination therapy"
• "8 Electrospinning and microfluidics An integrated approach for tissue engineering and cancer"
• "Genomic and proteomic approaches for studying human cancer: Prospects for true patient-
tailored therapy"
• "Community acquired respiratory virus infections in cancer patients—Guideline on diagnosis
and management by the Infectious Diseases Working Party of the German Society for
haematology and Medical Oncology"
• "RIG-I Enhanced Interferon Independent Apoptosis upon Junin Virus Infection"
• "Respiratory Viral Infections in Transplant and Oncology Patients“ …. 26
Cancer
GQ
Lymphoma
Cancer
GD
GQ
27
Lymphoma
Cancer
Lymphoma(x)⇒ Cancer(x)
GD
GQ
Cancer
Lymphoma
O
28
F∧O → R ⇔ GD ≤ GQ
mapping modulo an ontology
Lymphoma
Cancer
Lymphoma(x)⇒ Cancer(x)
GD
GQ
Cancer
Lymphoma
O
[Corby et al.]
AI methods: knowledge graphs, ontology-based formalisms, querying, validating and reasoning
29
Query for co-occurrence of cancers & coronaviruses with R Jupyter Notebooks… 😍😍
30
But you need to code… 😓😓
31
32
COVID ON THE WEB
Biomedical
researchers
& managers
Data analysts
…
RDF
translator
Jupyter Notebook
Python, R & analytics
Corese
engine
query
&
infer
ACTA Web application
visualization of argument graphs
Corese portal
Data browsing
MGExplorer
Data visualization
Open Data publication
Zenodo, Github, Virtuoso Applications
Named
Entities
extractors
dereference, query, download
query, browse, analyze, make sense
Covid-on-the-Web
dataset
LOD
ACTA pipeline
extraction of argument graphs
1
1
2
2
1
3
3
3
vocabularies
& datasets
3
3
3
2
Covid-19
Open
Research
Dataset
Process data &
derive “smarter” data
Means to exploit data Biomedical research
[Menin, Winckler, Corby, Giboin, Faron, Cadorel, Tettamanzi et al. 2020]
33
[ISWC 2020, IC 2021]
Examples of questions from
 Est-ce que les Coronavirus peuvent causer des cancers ?
 Est-ce qu’ils font partie de la famille des virus oncogènes ?
 Quelles sont les séquelles des infections SARS CoV 1 et 2, et MERS ?
 Est-ce que les épidémies SARS-Cov1 et MERS sont liées à des apparitions de cancers ?
 Est-ce que les lésions causées par l’infection SARS CoV2 peuvent potentialiser une
transformation tumorale? (évolution tissulaire et sensibilité au développement de cancers
suite à une fibrose, ou autres lésions)
 Quelles sont les voies de signalisation intracellulaires activées par les coronavirus ? Pendant
l’inflammation ? Quelles sont les adaptations métaboliques ?
 Est-ce qu’il y a des similitudes avec des virus oncogènes déjà connus tels que HPV, HBV, EBV,
etc. ? …
[Giboin, Faron et al. 2020]
34
Visualisation pipeline
[Menin, Winckler, Corby, Giboin, Faron et al. 2020]
35
Visualisation pipeline
[Menin, Winckler, Corby, Giboin, Faron et al. 2020]
36
?
[Menin, Winckler, Corby, Giboin, Faron et al. 2020]
37
Visualisation and exploration
[Menin, Winckler, Corby, Giboin, Faron et al. 2020]
38
Visualisation and exploration
mining interesting association rules
AI methods: clustering + community detection + dimensionality
reduction (auto-encoder) + Frequent Pattern Growth
[Cadorel, Tettamanzi]
39
[WI-IAT 2020]
mining interesting association rules
AI methods: clustering + community detection + dimensionality
reduction (auto-encoder) + Frequent Pattern Growth
• hidden patterns to enrich the dataset
• novel hypotheses for biomedical research
[Cadorel, Tettamanzi]
40
[WI-IAT 2020]
mining interesting association rules
AI methods: clustering + community detection + dimensionality
reduction (auto-encoder) + Frequent Pattern Growth
• hidden patterns to enrich the dataset
• novel hypotheses for biomedical research
• error detection in the dataset
• relevant clusters & communities for navigation
[Cadorel, Tettamanzi]
41
[WI-IAT 2020]
Visualization of Association found in Covid dataset
[Menin, Cadorel, Tettamanzi, Winckler]
42
CovidOnTheWeb
https://github.com/Wimmics/CovidOnTheWeb
https://covidontheweb.inria.fr/sparql
https://covidontheweb.inria.fr/fct/
https://doi.org/10.5281/zenodo.3833753
SPARQL
Fabien Gandon, Franck Michel, Valentin Ah-Kane,
Anna Bobasheva, Elena Cabrio, Olivier Corby,
Catherine Faron, Raphaël Gazzotti, Alain Giboin,
Santiago Marro, Tobias Mayer, Aline Menin,
Mathieu Simon, Serena Villata, and Marco Winckler
LOD
CovidOnTheWeb access Stats
Period January-April 2021
• Total URI accesses= 48 735 hence an average of 393 URI access/day
• Total SPARQL queries 49 036 hence an average of 395 queries/day
• 300 different agent types with two major ones:
Apache-Jena-ARQ (38 767) and Mozilla/4.0 (40 935)
Full dump of dataset on Zenodo
(end of April 2021)
• 61 unique downloads
• 954 unique views
[Michel, Gazzotti]
Visualisation pipeline
[Menin, Winckler, Corby, Giboin, Faron et al. 2020]
45
?
Dataset description No. RDF triples
dataset description + definition of a few properties 170
articles metadata (title, authors, DOIs, journal etc.) 3 722 381
named entities identified by Entity-fishing in articles titles/abstracts 35 049 832
named entities identified by Entity-fishing in articles bodies 1 156 611 321
named entities identified by Bioportal Annotator in articles titles/abstracts 104 430 547
named entities identified by DBpedia Spotlight in articles titles/abstracts 65 359 664
argumentative components and PICO elements by ACTA from articles titles/abstracts 7 469 234
Total 1 361 451 364
46
MULTI-DISCIPLINARY TEAM
 40~50 members
 ~15 nationalities
 1 DR, 4 Professors
 3CR, 3 Assistant professors
DR/Professors:
 Fabien GANDON, Inria, AI, KRR, Semantic Web, Social Web, K. Graphs
 Nhan LE THANH, UCA, Logics, KR, Emotions, Workflows, K. Graphs
 Peter SANDER, UCA, Web, Emotions
 Andrea TETTAMANZI, UCA, AI, Logics, Evo, Learning, Agents, K. Graphs
 Marco WINCKLER, UCA, Human-Computer Interaction, Web, K. Graphs
CR/Assistant Professors:
 Michel BUFFA, UCA, Web, Social Media, K. Graphs
 Elena CABRIO, UCA, NLP, KR, Linguistics, Q&A, Text Mining, K. Graphs
 Olivier CORBY, Inria, KR, AI, Sem. Web, Programming, K. Graphs
 Catherine FARON-ZUCKER, UCA, KR, AI, Semantic Web, K. Graphs
 Damien GRAUX, Inria, Linked Data, Sem. Web, Querying, K. Graphs
 Serena VILLATA, CNRS, AI, Argumentation, Licenses, Rights, K. Graphs
Research engineer: Franck MICHEL, CNRS, Linked Data, Integration, DB, K. Graphs
External:
 Andrei Ciortea (University of St. Gallen) Agents, WoT, Sem. Web, K. Graphs
 Nicolas DELAFORGE (Mnemotix) Sem. Web, KM, Integration, K. Graphs
 Alain GIBOIN, (Retired CR Inria), Interaction Design, KE, User & Task, K. Graphs
 Freddy LECUE (Thales, Montreal) AI, Logics, Mining, Big Data, S. Web , K. Graphs
DBPEDIA.FR
180 000 000 arcs in an
encyclopedic knowledge
graph
number of queries per day
70 000 on average
2.5 millions max
185 377 686 RDF triples extracted and mapped
public dumps, endpoints, interfaces, APIs…
This project has received funding from the European Union's Horizon 2020
research and innovation programme under grant agreement 825619.
ONTOLOGY FOR AI ITSELF
 ontology and metadata of AI resources
 SHACL to validate AI4EU these RDF graphs
 online endpoint http://corese.inria.fr
 predefined SPARQL queries, SHACL shapes, display
[Corby et al., 2019]
PREDICT HOSPITALIZATION
 Predict hospitalization from
Physician’s records classification
 Augment records data with
Web knowledge graphs
 Study impact on prediction
[Gazzotti, Faron, Gandon et al. 2020]
Sexe Date Cause CISP2 ... History Observations
H 25/04/2012 vaccin-antitétanique A44 ... Appendicite EN CP - Bon état général - auscult
pulm libre; bdc rég sans souffle -
tympans ok-
Element Number
Patients
Consultations
Past medical history
Biometric data
Semiotics
Diagnosis
Row of prescribed drugs
Symptoms
Health care procedures
Additional examination
Paramedical prescription
Observations/notes
55 823
364 684
187 290
293 908
250 669
117 442
847 422
23 488
11 850
871 590
17 222
56 143
(1)
(2)
PRIMEGE
Image Metadata Score 
portrait
50350012455
C:Jocondejoconde0138m503501_d0012455-000_p.jpg
cheval:
0.999
Image Metadata Score 
figure (saint Eloi de Noyon, évêque, en pied, bénédiction,
vêtement liturgique, mitre, attribut, cheval, marteau, outil :
ferronnerie)
000SC022652
C:/Joconde/joconde0355/m079806_bsa0030101_p.jpg
cheval:
0.006
MonaLIA
 reason & query on RDF to build training sets.
 transfer learning & CNN classifiers on targeted
categories (topics, techniques, etc.)
 reason & query RDF of results to address
silence, noise and explain
350 000 images
of artworks
RDF metadata based
on external thesauri
Joconde database from French museums
(1)
(3)
[Bobasheva, Gandon, Precioso, 2021]
(2)

CovidOnTheWeb : covid19 linked data published on the Web

  • 1.
    CovidOnTheWeb ~15 persons fromthe Wimmics Team [Fabien Gandon et al.] @fé-in 1
  • 2.
  • 3.
    vs. use cases… •Scenario 1: Help clinicians analyze clinical trials and take evidence-based decisions • Scenario 3: Help missions heads from Cancer Institute elaborate research programs to study the links between cancer and coronavirus 3 [Giboin, et al.]
  • 4.
  • 5.
    identify what exists onthe web http://my-site.fr 5
  • 6.
    identify what exists onthe web http://my-site.fr identify, on the web, what exists http://animals.org/this-zebra 6
  • 7.
    URIs for… • URIfor Paris in DBpedia: http://dbpedia.org/resource/Paris • URI for name of Victor Hugo in the Library of Congress: http://id.loc.gov/authorities/names/n79091479 • The MUC18 protein at UniProt http://www.uniprot.org/uniprot/P43121 • Xavier Dolan in Wikidata http://www.wikidata.org/entity/Q551861 • The book with doi:10.1007/3-540-45741-0_18 http://dx.doi.org/10.1007/3-540-45741-0_18 • 7
  • 8.
    URIs for… thingsmentioned in papers AI methods: NLP and IR for named entity annotation in text  DBpedia Spotlight → DBpedia URIs  Entity-fishing → Wikidata URIs  BioPortal Annotator → BioPortal URIs [Gazzotti, et al.] 8
  • 9.
    URIs for… thingsmentioned in papers AI methods: NLP and IR for named entity annotation in text  DBpedia Spotlight → DBpedia URIs  Entity-fishing → Wikidata URIs  BioPortal Annotator → BioPortal URIs “Effects on QT interval of hydroxychloroquine associated with ritonavir/darunavir or azithromycin in patients with SARS-CoV-2 infection” [Danzi et al.] http://dbpedia.org/resource/Severe_acute_respiratory _syndrome_coronavirus_2 http://dbpedia.org/resource/Hydroxychloroquine http://dbpedia.org/resource/Azithromycin http://dbpedia.org/resource/Ritonavir http://dbpedia.org/resource/Darunavir http://dbpedia.org/resource/Heart_arrhythmia [Gazzotti, et al.] 9
  • 10.
    Extracting arguments andPICO elements AI methods: BERT+SciBERT, LSTM, Conditional Random Field [Mayer, Cabrio, Villata, Marro, et al.] Evidence-Based Practice (EBP) •Patient Problem, (or Population) •Intervention, •Comparison or Control, and •Outcome 10 [ECAI 2020]
  • 11.
    COVID ON THEWEB RDF translator Named Entities extractors Covid-on-the-Web dataset LOD ACTA pipeline extraction of argument graphs 1 1 2 2 1 vocabularies & datasets 2 Covid-19 Open Research Dataset Process data & derive “smarter” data [Michel, Gazzotti, Gandon, Cabrio, Villata, Mayer et al.] 11 [ISWC 2020, IC 2021]
  • 12.
    ratatouille.fr or the recipefor linked data 12
  • 13.
    ratatouille.fr or the recipefor linked data 13
  • 14.
    ratatouille.fr or the recipefor linked data 14
  • 15.
    datatouille.fr or the recipefor linked data 15
  • 16.
    linked open data(sets)cloud on the Web in RDF 0 200 400 600 800 1000 1200 1400 01/05/2007 08/10/2007 07/11/2007 10/11/2007 28/02/2008 31/03/2008 18/09/2008 05/03/2009 27/03/2009 14/07/2009 22/09/2010 19/09/2011 30/08/2014 26/01/2017 number of linked open datasets on the Web 16
  • 17.
    Integrate in RDFnamed entities 17 morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
  • 18.
    Integrate in RDF metadata namedentities 18 morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
  • 19.
    Integrate in RDF metadata namedentities arguments PICO elements 19 morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
  • 20.
    Integrate in RDF metadata namedentities arguments PICO elements provenance morph-xr2rml, virtuoso [Michel, Gazzotti et al.] 20
  • 21.
    Integrate in RDF metadata namedentities arguments PICO elements provenance + other data sources morph-xr2rml, virtuoso [Michel, Gazzotti et al.] 21
  • 22.
    Dataset description No.RDF triples dataset description + definition of a few properties 170 articles metadata (title, authors, DOIs, journal etc.) 3 722 381 named entities identified by Entity-fishing in articles titles/abstracts 35 049 832 named entities identified by Entity-fishing in articles bodies 1 156 611 321 named entities identified by Bioportal Annotator in articles titles/abstracts 104 430 547 named entities identified by DBpedia Spotlight in articles titles/abstracts 65 359 664 argumentative components and PICO elements by ACTA from articles titles/abstracts 7 469 234 Total 1 361 451 364 22
  • 23.
  • 24.
    COVID ON THEWEB RDF translator Jupyter Notebook Python, R & analytics Corese engine query & infer ACTA Web application visualization of argument graphs Corese portal Data browsing MGExplorer Data visualization Open Data publication Zenodo, Github, Virtuoso Named Entities extractors Covid-on-the-Web dataset LOD ACTA pipeline extraction of argument graphs 1 1 2 2 1 3 3 3 vocabularies & datasets 3 3 3 2 Covid-19 Open Research Dataset Process data & derive “smarter” data Means to exploit data [Corby, Michel, Gazzotti, Gandon et al.] 24 [ISWC 2020, IC 2021]
  • 25.
  • 26.
    Articles about cancerin the COVID dataset: • "Antipsychotic treatment effects on cardiovascular, cancer, infection, and intentional self-harm as cause of death in patients with Alzheimer's dementia" • "Targeting cancer stem cell pathways for cancer therapy" • "Deubiquitinases and cancer: A snapshot" • "The Functional Properties of Preserved Eggs: From Anti-cancer and Anti-inflammatory Aspects" • "The functional role of the novel biomarker karyopherin α 2 (KPNA2) in cancer" • "Biochemical characterisation of lectin from Indian hyacinth plant bulbs with potential inhibitory action against human cancer cells" • "Purification, identification and profiling of serum amyloid A proteins from sera of advanced- stage cancer patients" • "Darwin, medicine and cancer" • "Outcome of Oncology Patients Infected With Coronavirus" • "Review and Meta-Analyses of TAAR1 Expression in the Immune System and Cancers" • "Experimental Data-Mining Analyses Reveal New Roles of Low-Intensity Ultrasound in Differentiating Cell Death Regulatome in Cancer and Non-cancer Cells via Potential Modulation of Chromatin Long-Range Interactions" • "Golgi anti-apoptotic protein: a tale of camels, calcium, channels and cancer" • "Severe novel influenza A (H1N1) infection in cancer patients" • "Molecular Profiling of Multiple Human Cancers Defines an Inflammatory Cancer-Associated Molecular Pattern and Uncovers KPNA2 as a Uniform Poor Prognostic Cancer Marker" • "SARS-CoV-2 transmission in cancer patients of a tertiary hospital in Wuhan" • "Creosote bush lignans for human disease treatment and prevention: Perspectives on combination therapy" • "8 Electrospinning and microfluidics An integrated approach for tissue engineering and cancer" • "Genomic and proteomic approaches for studying human cancer: Prospects for true patient- tailored therapy" • "Community acquired respiratory virus infections in cancer patients—Guideline on diagnosis and management by the Infectious Diseases Working Party of the German Society for haematology and Medical Oncology" • "RIG-I Enhanced Interferon Independent Apoptosis upon Junin Virus Infection" • "Respiratory Viral Infections in Transplant and Oncology Patients“ …. 26 Cancer GQ
  • 27.
  • 28.
  • 29.
    F∧O → R⇔ GD ≤ GQ mapping modulo an ontology Lymphoma Cancer Lymphoma(x)⇒ Cancer(x) GD GQ Cancer Lymphoma O [Corby et al.] AI methods: knowledge graphs, ontology-based formalisms, querying, validating and reasoning 29
  • 30.
    Query for co-occurrenceof cancers & coronaviruses with R Jupyter Notebooks… 😍😍 30
  • 31.
    But you needto code… 😓😓 31
  • 32.
  • 33.
    COVID ON THEWEB Biomedical researchers & managers Data analysts … RDF translator Jupyter Notebook Python, R & analytics Corese engine query & infer ACTA Web application visualization of argument graphs Corese portal Data browsing MGExplorer Data visualization Open Data publication Zenodo, Github, Virtuoso Applications Named Entities extractors dereference, query, download query, browse, analyze, make sense Covid-on-the-Web dataset LOD ACTA pipeline extraction of argument graphs 1 1 2 2 1 3 3 3 vocabularies & datasets 3 3 3 2 Covid-19 Open Research Dataset Process data & derive “smarter” data Means to exploit data Biomedical research [Menin, Winckler, Corby, Giboin, Faron, Cadorel, Tettamanzi et al. 2020] 33 [ISWC 2020, IC 2021]
  • 34.
    Examples of questionsfrom  Est-ce que les Coronavirus peuvent causer des cancers ?  Est-ce qu’ils font partie de la famille des virus oncogènes ?  Quelles sont les séquelles des infections SARS CoV 1 et 2, et MERS ?  Est-ce que les épidémies SARS-Cov1 et MERS sont liées à des apparitions de cancers ?  Est-ce que les lésions causées par l’infection SARS CoV2 peuvent potentialiser une transformation tumorale? (évolution tissulaire et sensibilité au développement de cancers suite à une fibrose, ou autres lésions)  Quelles sont les voies de signalisation intracellulaires activées par les coronavirus ? Pendant l’inflammation ? Quelles sont les adaptations métaboliques ?  Est-ce qu’il y a des similitudes avec des virus oncogènes déjà connus tels que HPV, HBV, EBV, etc. ? … [Giboin, Faron et al. 2020] 34
  • 35.
    Visualisation pipeline [Menin, Winckler,Corby, Giboin, Faron et al. 2020] 35
  • 36.
    Visualisation pipeline [Menin, Winckler,Corby, Giboin, Faron et al. 2020] 36 ?
  • 37.
    [Menin, Winckler, Corby,Giboin, Faron et al. 2020] 37 Visualisation and exploration
  • 38.
    [Menin, Winckler, Corby,Giboin, Faron et al. 2020] 38 Visualisation and exploration
  • 39.
    mining interesting associationrules AI methods: clustering + community detection + dimensionality reduction (auto-encoder) + Frequent Pattern Growth [Cadorel, Tettamanzi] 39 [WI-IAT 2020]
  • 40.
    mining interesting associationrules AI methods: clustering + community detection + dimensionality reduction (auto-encoder) + Frequent Pattern Growth • hidden patterns to enrich the dataset • novel hypotheses for biomedical research [Cadorel, Tettamanzi] 40 [WI-IAT 2020]
  • 41.
    mining interesting associationrules AI methods: clustering + community detection + dimensionality reduction (auto-encoder) + Frequent Pattern Growth • hidden patterns to enrich the dataset • novel hypotheses for biomedical research • error detection in the dataset • relevant clusters & communities for navigation [Cadorel, Tettamanzi] 41 [WI-IAT 2020]
  • 42.
    Visualization of Associationfound in Covid dataset [Menin, Cadorel, Tettamanzi, Winckler] 42
  • 43.
    CovidOnTheWeb https://github.com/Wimmics/CovidOnTheWeb https://covidontheweb.inria.fr/sparql https://covidontheweb.inria.fr/fct/ https://doi.org/10.5281/zenodo.3833753 SPARQL Fabien Gandon, FranckMichel, Valentin Ah-Kane, Anna Bobasheva, Elena Cabrio, Olivier Corby, Catherine Faron, Raphaël Gazzotti, Alain Giboin, Santiago Marro, Tobias Mayer, Aline Menin, Mathieu Simon, Serena Villata, and Marco Winckler LOD
  • 44.
    CovidOnTheWeb access Stats PeriodJanuary-April 2021 • Total URI accesses= 48 735 hence an average of 393 URI access/day • Total SPARQL queries 49 036 hence an average of 395 queries/day • 300 different agent types with two major ones: Apache-Jena-ARQ (38 767) and Mozilla/4.0 (40 935) Full dump of dataset on Zenodo (end of April 2021) • 61 unique downloads • 954 unique views [Michel, Gazzotti]
  • 45.
    Visualisation pipeline [Menin, Winckler,Corby, Giboin, Faron et al. 2020] 45 ?
  • 46.
    Dataset description No.RDF triples dataset description + definition of a few properties 170 articles metadata (title, authors, DOIs, journal etc.) 3 722 381 named entities identified by Entity-fishing in articles titles/abstracts 35 049 832 named entities identified by Entity-fishing in articles bodies 1 156 611 321 named entities identified by Bioportal Annotator in articles titles/abstracts 104 430 547 named entities identified by DBpedia Spotlight in articles titles/abstracts 65 359 664 argumentative components and PICO elements by ACTA from articles titles/abstracts 7 469 234 Total 1 361 451 364 46
  • 47.
    MULTI-DISCIPLINARY TEAM  40~50members  ~15 nationalities  1 DR, 4 Professors  3CR, 3 Assistant professors DR/Professors:  Fabien GANDON, Inria, AI, KRR, Semantic Web, Social Web, K. Graphs  Nhan LE THANH, UCA, Logics, KR, Emotions, Workflows, K. Graphs  Peter SANDER, UCA, Web, Emotions  Andrea TETTAMANZI, UCA, AI, Logics, Evo, Learning, Agents, K. Graphs  Marco WINCKLER, UCA, Human-Computer Interaction, Web, K. Graphs CR/Assistant Professors:  Michel BUFFA, UCA, Web, Social Media, K. Graphs  Elena CABRIO, UCA, NLP, KR, Linguistics, Q&A, Text Mining, K. Graphs  Olivier CORBY, Inria, KR, AI, Sem. Web, Programming, K. Graphs  Catherine FARON-ZUCKER, UCA, KR, AI, Semantic Web, K. Graphs  Damien GRAUX, Inria, Linked Data, Sem. Web, Querying, K. Graphs  Serena VILLATA, CNRS, AI, Argumentation, Licenses, Rights, K. Graphs Research engineer: Franck MICHEL, CNRS, Linked Data, Integration, DB, K. Graphs External:  Andrei Ciortea (University of St. Gallen) Agents, WoT, Sem. Web, K. Graphs  Nicolas DELAFORGE (Mnemotix) Sem. Web, KM, Integration, K. Graphs  Alain GIBOIN, (Retired CR Inria), Interaction Design, KE, User & Task, K. Graphs  Freddy LECUE (Thales, Montreal) AI, Logics, Mining, Big Data, S. Web , K. Graphs
  • 48.
    DBPEDIA.FR 180 000 000arcs in an encyclopedic knowledge graph number of queries per day 70 000 on average 2.5 millions max 185 377 686 RDF triples extracted and mapped public dumps, endpoints, interfaces, APIs…
  • 49.
    This project hasreceived funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 825619. ONTOLOGY FOR AI ITSELF  ontology and metadata of AI resources  SHACL to validate AI4EU these RDF graphs  online endpoint http://corese.inria.fr  predefined SPARQL queries, SHACL shapes, display [Corby et al., 2019]
  • 50.
    PREDICT HOSPITALIZATION  Predicthospitalization from Physician’s records classification  Augment records data with Web knowledge graphs  Study impact on prediction [Gazzotti, Faron, Gandon et al. 2020] Sexe Date Cause CISP2 ... History Observations H 25/04/2012 vaccin-antitétanique A44 ... Appendicite EN CP - Bon état général - auscult pulm libre; bdc rég sans souffle - tympans ok- Element Number Patients Consultations Past medical history Biometric data Semiotics Diagnosis Row of prescribed drugs Symptoms Health care procedures Additional examination Paramedical prescription Observations/notes 55 823 364 684 187 290 293 908 250 669 117 442 847 422 23 488 11 850 871 590 17 222 56 143 (1) (2) PRIMEGE
  • 51.
    Image Metadata Score portrait 50350012455 C:Jocondejoconde0138m503501_d0012455-000_p.jpg cheval: 0.999 Image Metadata Score  figure (saint Eloi de Noyon, évêque, en pied, bénédiction, vêtement liturgique, mitre, attribut, cheval, marteau, outil : ferronnerie) 000SC022652 C:/Joconde/joconde0355/m079806_bsa0030101_p.jpg cheval: 0.006 MonaLIA  reason & query on RDF to build training sets.  transfer learning & CNN classifiers on targeted categories (topics, techniques, etc.)  reason & query RDF of results to address silence, noise and explain 350 000 images of artworks RDF metadata based on external thesauri Joconde database from French museums (1) (3) [Bobasheva, Gandon, Precioso, 2021] (2)