Successfully reported this slideshow.

CovidOnTheWeb : covid19 linked data published on the Web

0

Share

Loading in …3
×
1 of 51
1 of 51

CovidOnTheWeb : covid19 linked data published on the Web

0

Share

Download to read offline

The Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the "COVID-19 Open Research Dataset" (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. We report on the RDF dataset and software resources produced in this project by leveraging skills in knowledge representation, text, data and argument mining, as well as data visualization and exploration. The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools based on the Corese Semantic Web platform, MGExplorer visualization library, as well as the Jupyter Notebook technology. All along this initiative, we have been engaged in discussions with healthcare and medical research institutes to align our approach with the actual needs of the biomedical community, and we have paid particular attention to comply with the open and reproducible science goals, and the FAIR principles.

The Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the "COVID-19 Open Research Dataset" (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. We report on the RDF dataset and software resources produced in this project by leveraging skills in knowledge representation, text, data and argument mining, as well as data visualization and exploration. The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools based on the Corese Semantic Web platform, MGExplorer visualization library, as well as the Jupyter Notebook technology. All along this initiative, we have been engaged in discussions with healthcare and medical research institutes to align our approach with the actual needs of the biomedical community, and we have paid particular attention to comply with the open and reproducible science goals, and the FAIR principles.

More Related Content

Related Audiobooks

Free with a 14 day trial from Scribd

See all

CovidOnTheWeb : covid19 linked data published on the Web

  1. 1. CovidOnTheWeb ~15 persons from the Wimmics Team [Fabien Gandon et al.] @fé-in 1
  2. 2. 2
  3. 3. vs. use cases… • Scenario 1: Help clinicians analyze clinical trials and take evidence-based decisions • Scenario 3: Help missions heads from Cancer Institute elaborate research programs to study the links between cancer and coronavirus 3 [Giboin, et al.]
  4. 4. 4 [Tim Berners-Lee et al., 1994] 4
  5. 5. identify what exists on the web http://my-site.fr 5
  6. 6. identify what exists on the web http://my-site.fr identify, on the web, what exists http://animals.org/this-zebra 6
  7. 7. URIs for… • URI for Paris in DBpedia: http://dbpedia.org/resource/Paris • URI for name of Victor Hugo in the Library of Congress: http://id.loc.gov/authorities/names/n79091479 • The MUC18 protein at UniProt http://www.uniprot.org/uniprot/P43121 • Xavier Dolan in Wikidata http://www.wikidata.org/entity/Q551861 • The book with doi:10.1007/3-540-45741-0_18 http://dx.doi.org/10.1007/3-540-45741-0_18 • 7
  8. 8. URIs for… things mentioned in papers AI methods: NLP and IR for named entity annotation in text  DBpedia Spotlight → DBpedia URIs  Entity-fishing → Wikidata URIs  BioPortal Annotator → BioPortal URIs [Gazzotti, et al.] 8
  9. 9. URIs for… things mentioned in papers AI methods: NLP and IR for named entity annotation in text  DBpedia Spotlight → DBpedia URIs  Entity-fishing → Wikidata URIs  BioPortal Annotator → BioPortal URIs “Effects on QT interval of hydroxychloroquine associated with ritonavir/darunavir or azithromycin in patients with SARS-CoV-2 infection” [Danzi et al.] http://dbpedia.org/resource/Severe_acute_respiratory _syndrome_coronavirus_2 http://dbpedia.org/resource/Hydroxychloroquine http://dbpedia.org/resource/Azithromycin http://dbpedia.org/resource/Ritonavir http://dbpedia.org/resource/Darunavir http://dbpedia.org/resource/Heart_arrhythmia [Gazzotti, et al.] 9
  10. 10. Extracting arguments and PICO elements AI methods: BERT+SciBERT, LSTM, Conditional Random Field [Mayer, Cabrio, Villata, Marro, et al.] Evidence-Based Practice (EBP) •Patient Problem, (or Population) •Intervention, •Comparison or Control, and •Outcome 10 [ECAI 2020]
  11. 11. COVID ON THE WEB RDF translator Named Entities extractors Covid-on-the-Web dataset LOD ACTA pipeline extraction of argument graphs 1 1 2 2 1 vocabularies & datasets 2 Covid-19 Open Research Dataset Process data & derive “smarter” data [Michel, Gazzotti, Gandon, Cabrio, Villata, Mayer et al.] 11 [ISWC 2020, IC 2021]
  12. 12. ratatouille.fr or the recipe for linked data 12
  13. 13. ratatouille.fr or the recipe for linked data 13
  14. 14. ratatouille.fr or the recipe for linked data 14
  15. 15. datatouille.fr or the recipe for linked data 15
  16. 16. linked open data(sets) cloud on the Web in RDF 0 200 400 600 800 1000 1200 1400 01/05/2007 08/10/2007 07/11/2007 10/11/2007 28/02/2008 31/03/2008 18/09/2008 05/03/2009 27/03/2009 14/07/2009 22/09/2010 19/09/2011 30/08/2014 26/01/2017 number of linked open datasets on the Web 16
  17. 17. Integrate in RDF named entities 17 morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
  18. 18. Integrate in RDF metadata named entities 18 morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
  19. 19. Integrate in RDF metadata named entities arguments PICO elements 19 morph-xr2rml, virtuoso [Michel, Gazzotti et al.]
  20. 20. Integrate in RDF metadata named entities arguments PICO elements provenance morph-xr2rml, virtuoso [Michel, Gazzotti et al.] 20
  21. 21. Integrate in RDF metadata named entities arguments PICO elements provenance + other data sources morph-xr2rml, virtuoso [Michel, Gazzotti et al.] 21
  22. 22. Dataset description No. RDF triples dataset description + definition of a few properties 170 articles metadata (title, authors, DOIs, journal etc.) 3 722 381 named entities identified by Entity-fishing in articles titles/abstracts 35 049 832 named entities identified by Entity-fishing in articles bodies 1 156 611 321 named entities identified by Bioportal Annotator in articles titles/abstracts 104 430 547 named entities identified by DBpedia Spotlight in articles titles/abstracts 65 359 664 argumentative components and PICO elements by ACTA from articles titles/abstracts 7 469 234 Total 1 361 451 364 22
  23. 23. 23
  24. 24. COVID ON THE WEB RDF translator Jupyter Notebook Python, R & analytics Corese engine query & infer ACTA Web application visualization of argument graphs Corese portal Data browsing MGExplorer Data visualization Open Data publication Zenodo, Github, Virtuoso Named Entities extractors Covid-on-the-Web dataset LOD ACTA pipeline extraction of argument graphs 1 1 2 2 1 3 3 3 vocabularies & datasets 3 3 3 2 Covid-19 Open Research Dataset Process data & derive “smarter” data Means to exploit data [Corby, Michel, Gazzotti, Gandon et al.] 24 [ISWC 2020, IC 2021]
  25. 25. 25 Cancer GQ
  26. 26. Articles about cancer in the COVID dataset: • "Antipsychotic treatment effects on cardiovascular, cancer, infection, and intentional self-harm as cause of death in patients with Alzheimer's dementia" • "Targeting cancer stem cell pathways for cancer therapy" • "Deubiquitinases and cancer: A snapshot" • "The Functional Properties of Preserved Eggs: From Anti-cancer and Anti-inflammatory Aspects" • "The functional role of the novel biomarker karyopherin α 2 (KPNA2) in cancer" • "Biochemical characterisation of lectin from Indian hyacinth plant bulbs with potential inhibitory action against human cancer cells" • "Purification, identification and profiling of serum amyloid A proteins from sera of advanced- stage cancer patients" • "Darwin, medicine and cancer" • "Outcome of Oncology Patients Infected With Coronavirus" • "Review and Meta-Analyses of TAAR1 Expression in the Immune System and Cancers" • "Experimental Data-Mining Analyses Reveal New Roles of Low-Intensity Ultrasound in Differentiating Cell Death Regulatome in Cancer and Non-cancer Cells via Potential Modulation of Chromatin Long-Range Interactions" • "Golgi anti-apoptotic protein: a tale of camels, calcium, channels and cancer" • "Severe novel influenza A (H1N1) infection in cancer patients" • "Molecular Profiling of Multiple Human Cancers Defines an Inflammatory Cancer-Associated Molecular Pattern and Uncovers KPNA2 as a Uniform Poor Prognostic Cancer Marker" • "SARS-CoV-2 transmission in cancer patients of a tertiary hospital in Wuhan" • "Creosote bush lignans for human disease treatment and prevention: Perspectives on combination therapy" • "8 Electrospinning and microfluidics An integrated approach for tissue engineering and cancer" • "Genomic and proteomic approaches for studying human cancer: Prospects for true patient- tailored therapy" • "Community acquired respiratory virus infections in cancer patients—Guideline on diagnosis and management by the Infectious Diseases Working Party of the German Society for haematology and Medical Oncology" • "RIG-I Enhanced Interferon Independent Apoptosis upon Junin Virus Infection" • "Respiratory Viral Infections in Transplant and Oncology Patients“ …. 26 Cancer GQ
  27. 27. Lymphoma Cancer GD GQ 27
  28. 28. Lymphoma Cancer Lymphoma(x)⇒ Cancer(x) GD GQ Cancer Lymphoma O 28
  29. 29. F∧O → R ⇔ GD ≤ GQ mapping modulo an ontology Lymphoma Cancer Lymphoma(x)⇒ Cancer(x) GD GQ Cancer Lymphoma O [Corby et al.] AI methods: knowledge graphs, ontology-based formalisms, querying, validating and reasoning 29
  30. 30. Query for co-occurrence of cancers & coronaviruses with R Jupyter Notebooks… 😍😍 30
  31. 31. But you need to code… 😓😓 31
  32. 32. 32
  33. 33. COVID ON THE WEB Biomedical researchers & managers Data analysts … RDF translator Jupyter Notebook Python, R & analytics Corese engine query & infer ACTA Web application visualization of argument graphs Corese portal Data browsing MGExplorer Data visualization Open Data publication Zenodo, Github, Virtuoso Applications Named Entities extractors dereference, query, download query, browse, analyze, make sense Covid-on-the-Web dataset LOD ACTA pipeline extraction of argument graphs 1 1 2 2 1 3 3 3 vocabularies & datasets 3 3 3 2 Covid-19 Open Research Dataset Process data & derive “smarter” data Means to exploit data Biomedical research [Menin, Winckler, Corby, Giboin, Faron, Cadorel, Tettamanzi et al. 2020] 33 [ISWC 2020, IC 2021]
  34. 34. Examples of questions from  Est-ce que les Coronavirus peuvent causer des cancers ?  Est-ce qu’ils font partie de la famille des virus oncogènes ?  Quelles sont les séquelles des infections SARS CoV 1 et 2, et MERS ?  Est-ce que les épidémies SARS-Cov1 et MERS sont liées à des apparitions de cancers ?  Est-ce que les lésions causées par l’infection SARS CoV2 peuvent potentialiser une transformation tumorale? (évolution tissulaire et sensibilité au développement de cancers suite à une fibrose, ou autres lésions)  Quelles sont les voies de signalisation intracellulaires activées par les coronavirus ? Pendant l’inflammation ? Quelles sont les adaptations métaboliques ?  Est-ce qu’il y a des similitudes avec des virus oncogènes déjà connus tels que HPV, HBV, EBV, etc. ? … [Giboin, Faron et al. 2020] 34
  35. 35. Visualisation pipeline [Menin, Winckler, Corby, Giboin, Faron et al. 2020] 35
  36. 36. Visualisation pipeline [Menin, Winckler, Corby, Giboin, Faron et al. 2020] 36 ?
  37. 37. [Menin, Winckler, Corby, Giboin, Faron et al. 2020] 37 Visualisation and exploration
  38. 38. [Menin, Winckler, Corby, Giboin, Faron et al. 2020] 38 Visualisation and exploration
  39. 39. mining interesting association rules AI methods: clustering + community detection + dimensionality reduction (auto-encoder) + Frequent Pattern Growth [Cadorel, Tettamanzi] 39 [WI-IAT 2020]
  40. 40. mining interesting association rules AI methods: clustering + community detection + dimensionality reduction (auto-encoder) + Frequent Pattern Growth • hidden patterns to enrich the dataset • novel hypotheses for biomedical research [Cadorel, Tettamanzi] 40 [WI-IAT 2020]
  41. 41. mining interesting association rules AI methods: clustering + community detection + dimensionality reduction (auto-encoder) + Frequent Pattern Growth • hidden patterns to enrich the dataset • novel hypotheses for biomedical research • error detection in the dataset • relevant clusters & communities for navigation [Cadorel, Tettamanzi] 41 [WI-IAT 2020]
  42. 42. Visualization of Association found in Covid dataset [Menin, Cadorel, Tettamanzi, Winckler] 42
  43. 43. CovidOnTheWeb https://github.com/Wimmics/CovidOnTheWeb https://covidontheweb.inria.fr/sparql https://covidontheweb.inria.fr/fct/ https://doi.org/10.5281/zenodo.3833753 SPARQL Fabien Gandon, Franck Michel, Valentin Ah-Kane, Anna Bobasheva, Elena Cabrio, Olivier Corby, Catherine Faron, Raphaël Gazzotti, Alain Giboin, Santiago Marro, Tobias Mayer, Aline Menin, Mathieu Simon, Serena Villata, and Marco Winckler LOD
  44. 44. CovidOnTheWeb access Stats Period January-April 2021 • Total URI accesses= 48 735 hence an average of 393 URI access/day • Total SPARQL queries 49 036 hence an average of 395 queries/day • 300 different agent types with two major ones: Apache-Jena-ARQ (38 767) and Mozilla/4.0 (40 935) Full dump of dataset on Zenodo (end of April 2021) • 61 unique downloads • 954 unique views [Michel, Gazzotti]
  45. 45. Visualisation pipeline [Menin, Winckler, Corby, Giboin, Faron et al. 2020] 45 ?
  46. 46. Dataset description No. RDF triples dataset description + definition of a few properties 170 articles metadata (title, authors, DOIs, journal etc.) 3 722 381 named entities identified by Entity-fishing in articles titles/abstracts 35 049 832 named entities identified by Entity-fishing in articles bodies 1 156 611 321 named entities identified by Bioportal Annotator in articles titles/abstracts 104 430 547 named entities identified by DBpedia Spotlight in articles titles/abstracts 65 359 664 argumentative components and PICO elements by ACTA from articles titles/abstracts 7 469 234 Total 1 361 451 364 46
  47. 47. MULTI-DISCIPLINARY TEAM  40~50 members  ~15 nationalities  1 DR, 4 Professors  3CR, 3 Assistant professors DR/Professors:  Fabien GANDON, Inria, AI, KRR, Semantic Web, Social Web, K. Graphs  Nhan LE THANH, UCA, Logics, KR, Emotions, Workflows, K. Graphs  Peter SANDER, UCA, Web, Emotions  Andrea TETTAMANZI, UCA, AI, Logics, Evo, Learning, Agents, K. Graphs  Marco WINCKLER, UCA, Human-Computer Interaction, Web, K. Graphs CR/Assistant Professors:  Michel BUFFA, UCA, Web, Social Media, K. Graphs  Elena CABRIO, UCA, NLP, KR, Linguistics, Q&A, Text Mining, K. Graphs  Olivier CORBY, Inria, KR, AI, Sem. Web, Programming, K. Graphs  Catherine FARON-ZUCKER, UCA, KR, AI, Semantic Web, K. Graphs  Damien GRAUX, Inria, Linked Data, Sem. Web, Querying, K. Graphs  Serena VILLATA, CNRS, AI, Argumentation, Licenses, Rights, K. Graphs Research engineer: Franck MICHEL, CNRS, Linked Data, Integration, DB, K. Graphs External:  Andrei Ciortea (University of St. Gallen) Agents, WoT, Sem. Web, K. Graphs  Nicolas DELAFORGE (Mnemotix) Sem. Web, KM, Integration, K. Graphs  Alain GIBOIN, (Retired CR Inria), Interaction Design, KE, User & Task, K. Graphs  Freddy LECUE (Thales, Montreal) AI, Logics, Mining, Big Data, S. Web , K. Graphs
  48. 48. DBPEDIA.FR 180 000 000 arcs in an encyclopedic knowledge graph number of queries per day 70 000 on average 2.5 millions max 185 377 686 RDF triples extracted and mapped public dumps, endpoints, interfaces, APIs…
  49. 49. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 825619. ONTOLOGY FOR AI ITSELF  ontology and metadata of AI resources  SHACL to validate AI4EU these RDF graphs  online endpoint http://corese.inria.fr  predefined SPARQL queries, SHACL shapes, display [Corby et al., 2019]
  50. 50. PREDICT HOSPITALIZATION  Predict hospitalization from Physician’s records classification  Augment records data with Web knowledge graphs  Study impact on prediction [Gazzotti, Faron, Gandon et al. 2020] Sexe Date Cause CISP2 ... History Observations H 25/04/2012 vaccin-antitétanique A44 ... Appendicite EN CP - Bon état général - auscult pulm libre; bdc rég sans souffle - tympans ok- Element Number Patients Consultations Past medical history Biometric data Semiotics Diagnosis Row of prescribed drugs Symptoms Health care procedures Additional examination Paramedical prescription Observations/notes 55 823 364 684 187 290 293 908 250 669 117 442 847 422 23 488 11 850 871 590 17 222 56 143 (1) (2) PRIMEGE
  51. 51. Image Metadata Score  portrait 50350012455 C:Jocondejoconde0138m503501_d0012455-000_p.jpg cheval: 0.999 Image Metadata Score  figure (saint Eloi de Noyon, évêque, en pied, bénédiction, vêtement liturgique, mitre, attribut, cheval, marteau, outil : ferronnerie) 000SC022652 C:/Joconde/joconde0355/m079806_bsa0030101_p.jpg cheval: 0.006 MonaLIA  reason & query on RDF to build training sets.  transfer learning & CNN classifiers on targeted categories (topics, techniques, etc.)  reason & query RDF of results to address silence, noise and explain 350 000 images of artworks RDF metadata based on external thesauri Joconde database from French museums (1) (3) [Bobasheva, Gandon, Precioso, 2021] (2)

×