SlideShare a Scribd company logo
* Wimmics: AI in bridging social semantics and formal semantics on the Web
Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA: Generic Pipeline,
Knowledge Model and
Visualization tools to
Help Scientists Search and
Make Sense of a Scientific Archive
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Issue: skyrocketing pace of publications
Bibliographic search difficult:
• Find and make sense of relevant articles
• Search across multiple disciplines
Central role of open scientific archives
But the provided services have limitations:
• String-based search fails to grasp semantic relationships
• Keywords often too general to be helpful
 Need for smarter search services exploiting this knowledge
2
Open Science
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3
Propose a generic, reusable, extensible
solution to optimize bibliographic search
in an open scientific archive.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How did we do that?
• Extract rich metadata from the publications
in multiple languages
• Turn it into a semantic index published
on the web as a RDF knowledge graph
• Link with general vocabularies as well as
domain-specific vocabularies
• Provide flexible search/visualization tools
able to exploit the index
4
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5
The ISSA
pipeline
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
OpenArchive
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
What metadata ?
• Title
• Authors (strings)
• Date
• Publication
• Languages
• Identifiers
• Abstract
• License
• URL of the PDF file
• …
OAI-PMH protocol:
• Supported by many open
libraries & archives (70% [1])
• Harvested by aggregators
e.g. Google Scholar,
OpenAIRE
[1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional
Repositories. 10.1201/9781315155890-5.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 2. Populate the knowledge graph with metadata
Metadata RDF representation with standard vocabularies:
Dublin Core, BIBO, FABIO/FRBR,
EPRINT, FOAF, PROVO, Schema.org
(Morph-xR2RML)
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 3. Full text extraction
(GROBID)
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 4. Indexing and NEs extractions
ANNOTATE
& VALIDATE
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Find out descriptors that
characterize publications
 Rely on the Annif open-source
indexating p/f
 AGROVOC thesaurus
 Training corpus: Agritrop
subset + expert descriptors
 Evaluation of different
classification models
11
Thematic &
geographic indexing
Structured text Structured text
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Annotate parts of text with
referring to concepts from
controlled vocabularies:
 Wikidata
 Geonames (through Wikidata)
 DBpedia
 AGROVOC
12
NEs extraction
and linking
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
Step 5. Populate the knowledge graph with
descriptors and NEs
(Morph-xR2RML)
Web Annotation Vocabulary
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
Mining & Visualization
Association rules mining
Augmented visualization
6
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
DEFINE & USE
Step 6. Mining and Visualization
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15
Mining & Visualization
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore descriptors association rules
16
Extract and visualize
association rules between
articles’ descriptors
with ARViz.
Suited for the discovery
of (possibly unexpected)
associations
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
17
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
18
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
19
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
20
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore networks of articles, descriptors…
Same tools to explore:
• Network of articles with
co-authors
• Network of authors with
co-publications
• Networks of institutions
with same research topics
• …
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Quick
summary
• Pipeline and visualization tools successfully
deployed for Agritrop
• 100,000+ articles’ metadata and abstract
• 12,000 OA articles with full text
• Pipeline for Agritrop ready to transfer
to other archives with limited work
• Only open licenses (code, documentation…)
• Based on OS, robust tools and technologies,
Docker-based
• Extensible with new steps following simple
guidelines
22
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
CIRAD willing to deploy the ISSA pipeline and
visualization tools in production for all users of Agritrop.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA 2 – CfP CollEx-Persée 2021-2022
Exploit & expand the results of ISSA:
◦ Extract new knowledge: relationships between NEs,
authors disambiguation, cross references… Link to taxonomic registries?
◦ Broaden the service offering for researchers and documentalists:
semantic search, geographical visualization, bibliometry
◦ Non-supervised indexing + improve data quality metrics
Extend the PoC to the HAL instance of EuroMov Digital Health in Motion
25
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thank-you
https://issa.cirad.fr/
https://github.com/issa-project
@ProjetISSA

More Related Content

Similar to ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive

Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
Niklas Blomberg
 
Ontology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortalOntology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortal
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
Franck Michel
 
Semantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretationSemantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretation
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
OpenAIRE
 
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
CASRAI
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
ResearchSpace
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
OpenAIRE
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
Simeon Warner
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
heila1
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
Heinz Pampel
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal management
Jisc
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
The HDF-EOS Tools and Information Center
 
Open Archives & Open Access
Open Archives & Open AccessOpen Archives & Open Access
Open Archives & Open Access
Mokhtar Ben Henda
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Figoblog
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Diego López-de-Ipiña González-de-Artaza
 

Similar to ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive (20)

Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
Ontology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortalOntology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortal
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
Semantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretationSemantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretation
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
 
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Overview
OverviewOverview
Overview
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal management
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Open Archives & Open Access
Open Archives & Open AccessOpen Archives & Open Access
Open Archives & Open Access
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...
 

More from Franck Michel

Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Franck Michel
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
Franck Michel
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Franck Michel
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Franck Michel
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
Franck Michel
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
Franck Michel
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
Franck Michel
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Franck Michel
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
Franck Michel
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
Franck Michel
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Franck Michel
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Franck Michel
 

More from Franck Michel (12)

Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
 

Recently uploaded

Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 

Recently uploaded (20)

Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 

ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive

  • 1. * Wimmics: AI in bridging social semantics and formal semantics on the Web Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive
  • 2. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Issue: skyrocketing pace of publications Bibliographic search difficult: • Find and make sense of relevant articles • Search across multiple disciplines Central role of open scientific archives But the provided services have limitations: • String-based search fails to grasp semantic relationships • Keywords often too general to be helpful  Need for smarter search services exploiting this knowledge 2 Open Science
  • 3. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3 Propose a generic, reusable, extensible solution to optimize bibliographic search in an open scientific archive.
  • 4. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France How did we do that? • Extract rich metadata from the publications in multiple languages • Turn it into a semantic index published on the web as a RDF knowledge graph • Link with general vocabularies as well as domain-specific vocabularies • Provide flexible search/visualization tools able to exploit the index 4
  • 5. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5 The ISSA pipeline
  • 6. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France OpenArchive ISSA Pipeline User Communities DEFINE Step 1. Retrieval of metadata records
  • 7. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 ISSA Pipeline User Communities DEFINE Step 1. Retrieval of metadata records What metadata ? • Title • Authors (strings) • Date • Publication • Languages • Identifiers • Abstract • License • URL of the PDF file • … OAI-PMH protocol: • Supported by many open libraries & archives (70% [1]) • Harvested by aggregators e.g. Google Scholar, OpenAIRE [1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional Repositories. 10.1201/9781315155890-5.
  • 8. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Virtuoso Triple Store 2 Translation to RDF ISSA Pipeline User Communities DEFINE QUERY Step 2. Populate the knowledge graph with metadata Metadata RDF representation with standard vocabularies: Dublin Core, BIBO, FABIO/FRBR, EPRINT, FOAF, PROVO, Schema.org (Morph-xR2RML)
  • 9. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 Translation to RDF ISSA Pipeline User Communities DEFINE QUERY Step 3. Full text extraction (GROBID)
  • 10. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri ISSA Pipeline User Communities DEFINE QUERY Step 4. Indexing and NEs extractions ANNOTATE & VALIDATE
  • 11. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Find out descriptors that characterize publications  Rely on the Annif open-source indexating p/f  AGROVOC thesaurus  Training corpus: Agritrop subset + expert descriptors  Evaluation of different classification models 11 Thematic & geographic indexing Structured text Structured text
  • 12. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Annotate parts of text with referring to concepts from controlled vocabularies:  Wikidata  Geonames (through Wikidata)  DBpedia  AGROVOC 12 NEs extraction and linking
  • 13. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri Translation to RDF 5 ISSA Pipeline User Communities DEFINE QUERY ANNOTATE & VALIDATE Step 5. Populate the knowledge graph with descriptors and NEs (Morph-xR2RML) Web Annotation Vocabulary
  • 14. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri Translation to RDF 5 Mining & Visualization Association rules mining Augmented visualization 6 ISSA Pipeline User Communities DEFINE QUERY ANNOTATE & VALIDATE DEFINE & USE Step 6. Mining and Visualization
  • 15. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15 Mining & Visualization
  • 16. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore descriptors association rules 16 Extract and visualize association rules between articles’ descriptors with ARViz. Suited for the discovery of (possibly unexpected) associations
  • 17. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 17 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 18. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 18 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 19. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 19 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 20. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 20 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 21. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore networks of articles, descriptors… Same tools to explore: • Network of articles with co-authors • Network of authors with co-publications • Networks of institutions with same research topics • …
  • 22. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Quick summary • Pipeline and visualization tools successfully deployed for Agritrop • 100,000+ articles’ metadata and abstract • 12,000 OA articles with full text • Pipeline for Agritrop ready to transfer to other archives with limited work • Only open licenses (code, documentation…) • Based on OS, robust tools and technologies, Docker-based • Extensible with new steps following simple guidelines 22
  • 23. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Perspectives https://unsplash.com/photos/ROOrGTNurYI
  • 24. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Perspectives https://unsplash.com/photos/ROOrGTNurYI CIRAD willing to deploy the ISSA pipeline and visualization tools in production for all users of Agritrop.
  • 25. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France ISSA 2 – CfP CollEx-Persée 2021-2022 Exploit & expand the results of ISSA: ◦ Extract new knowledge: relationships between NEs, authors disambiguation, cross references… Link to taxonomic registries? ◦ Broaden the service offering for researchers and documentalists: semantic search, geographical visualization, bibliometry ◦ Non-supervised indexing + improve data quality metrics Extend the PoC to the HAL instance of EuroMov Digital Health in Motion 25
  • 26. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Thank-you https://issa.cirad.fr/ https://github.com/issa-project @ProjetISSA