SlideShare a Scribd company logo
1 of 26
Download to read offline
* Wimmics: AI in bridging social semantics and formal semantics on the Web
Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA: Generic Pipeline,
Knowledge Model and
Visualization tools to
Help Scientists Search and
Make Sense of a Scientific Archive
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Issue: skyrocketing pace of publications
Bibliographic search difficult:
• Find and make sense of relevant articles
• Search across multiple disciplines
Central role of open scientific archives
But the provided services have limitations:
• String-based search fails to grasp semantic relationships
• Keywords often too general to be helpful
 Need for smarter search services exploiting this knowledge
2
Open Science
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3
Propose a generic, reusable, extensible
solution to optimize bibliographic search
in an open scientific archive.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How did we do that?
• Extract rich metadata from the publications
in multiple languages
• Turn it into a semantic index published
on the web as a RDF knowledge graph
• Link with general vocabularies as well as
domain-specific vocabularies
• Provide flexible search/visualization tools
able to exploit the index
4
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5
The ISSA
pipeline
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
OpenArchive
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
What metadata ?
• Title
• Authors (strings)
• Date
• Publication
• Languages
• Identifiers
• Abstract
• License
• URL of the PDF file
• …
OAI-PMH protocol:
• Supported by many open
libraries & archives (70% [1])
• Harvested by aggregators
e.g. Google Scholar,
OpenAIRE
[1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional
Repositories. 10.1201/9781315155890-5.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 2. Populate the knowledge graph with metadata
Metadata RDF representation with standard vocabularies:
Dublin Core, BIBO, FABIO/FRBR,
EPRINT, FOAF, PROVO, Schema.org
(Morph-xR2RML)
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 3. Full text extraction
(GROBID)
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 4. Indexing and NEs extractions
ANNOTATE
& VALIDATE
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Find out descriptors that
characterize publications
 Rely on the Annif open-source
indexating p/f
 AGROVOC thesaurus
 Training corpus: Agritrop
subset + expert descriptors
 Evaluation of different
classification models
11
Thematic &
geographic indexing
Structured text Structured text
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Annotate parts of text with
referring to concepts from
controlled vocabularies:
 Wikidata
 Geonames (through Wikidata)
 DBpedia
 AGROVOC
12
NEs extraction
and linking
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
Step 5. Populate the knowledge graph with
descriptors and NEs
(Morph-xR2RML)
Web Annotation Vocabulary
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
Mining & Visualization
Association rules mining
Augmented visualization
6
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
DEFINE & USE
Step 6. Mining and Visualization
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15
Mining & Visualization
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore descriptors association rules
16
Extract and visualize
association rules between
articles’ descriptors
with ARViz.
Suited for the discovery
of (possibly unexpected)
associations
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
17
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
18
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
19
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
20
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore networks of articles, descriptors…
Same tools to explore:
• Network of articles with
co-authors
• Network of authors with
co-publications
• Networks of institutions
with same research topics
• …
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Quick
summary
• Pipeline and visualization tools successfully
deployed for Agritrop
• 100,000+ articles’ metadata and abstract
• 12,000 OA articles with full text
• Pipeline for Agritrop ready to transfer
to other archives with limited work
• Only open licenses (code, documentation…)
• Based on OS, robust tools and technologies,
Docker-based
• Extensible with new steps following simple
guidelines
22
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
CIRAD willing to deploy the ISSA pipeline and
visualization tools in production for all users of Agritrop.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA 2 – CfP CollEx-Persée 2021-2022
Exploit & expand the results of ISSA:
◦ Extract new knowledge: relationships between NEs,
authors disambiguation, cross references… Link to taxonomic registries?
◦ Broaden the service offering for researchers and documentalists:
semantic search, geographical visualization, bibliometry
◦ Non-supervised indexing + improve data quality metrics
Extend the PoC to the HAL instance of EuroMov Digital Health in Motion
25
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thank-you
https://issa.cirad.fr/
https://github.com/issa-project
@ProjetISSA

More Related Content

Similar to ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive

Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meetingNiklas Blomberg
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020OpenAIRE
 
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...CASRAI
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...Open Science Fair
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsSimeon Warner
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019heila1
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementJisc
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Figoblog
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Diego López-de-Ipiña González-de-Artaza
 

Similar to ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive (20)

Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
Ontology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortalOntology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortal
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
Semantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretationSemantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretation
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
 
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Overview
OverviewOverview
Overview
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal management
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Open Archives & Open Access
Open Archives & Open AccessOpen Archives & Open Access
Open Archives & Open Access
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...
 

More from Franck Michel

Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Franck Michel
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataFranck Michel
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Franck Michel
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesFranck Michel
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataFranck Michel
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Franck Michel
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLFranck Michel
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...Franck Michel
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLFranck Michel
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Franck Michel
 

More from Franck Michel (12)

Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
 

Recently uploaded

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Sérgio Sacani
 
-case selection and treatment planing.pptx
-case selection and treatment planing.pptx-case selection and treatment planing.pptx
-case selection and treatment planing.pptxmohamedturki866
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Sérgio Sacani
 
MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...Annibale Panichella
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Sérgio Sacani
 
family therapy psychotherapy types .pdf
family therapy psychotherapy types  .pdffamily therapy psychotherapy types  .pdf
family therapy psychotherapy types .pdfhaseebahmeddrama
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
Cellular Communication and regulation of communication mechanisms to sing the...
Cellular Communication and regulation of communication mechanisms to sing the...Cellular Communication and regulation of communication mechanisms to sing the...
Cellular Communication and regulation of communication mechanisms to sing the...Nistarini College, Purulia (W.B) India
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxKyawThanTint
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategyMansiBishnoi1
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surfaceSérgio Sacani
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani
 
B lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationB lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationBhanu Krishan
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptxCherry
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent Universitypablovgd
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfPharmatech-rx
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfpablovgd
 
GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)Areesha Ahmad
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxmuralinath2
 

Recently uploaded (20)

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
-case selection and treatment planing.pptx
-case selection and treatment planing.pptx-case selection and treatment planing.pptx
-case selection and treatment planing.pptx
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
family therapy psychotherapy types .pdf
family therapy psychotherapy types  .pdffamily therapy psychotherapy types  .pdf
family therapy psychotherapy types .pdf
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Cellular Communication and regulation of communication mechanisms to sing the...
Cellular Communication and regulation of communication mechanisms to sing the...Cellular Communication and regulation of communication mechanisms to sing the...
Cellular Communication and regulation of communication mechanisms to sing the...
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
B lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationB lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and Activation
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 

ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive

  • 1. * Wimmics: AI in bridging social semantics and formal semantics on the Web Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive
  • 2. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Issue: skyrocketing pace of publications Bibliographic search difficult: • Find and make sense of relevant articles • Search across multiple disciplines Central role of open scientific archives But the provided services have limitations: • String-based search fails to grasp semantic relationships • Keywords often too general to be helpful  Need for smarter search services exploiting this knowledge 2 Open Science
  • 3. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3 Propose a generic, reusable, extensible solution to optimize bibliographic search in an open scientific archive.
  • 4. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France How did we do that? • Extract rich metadata from the publications in multiple languages • Turn it into a semantic index published on the web as a RDF knowledge graph • Link with general vocabularies as well as domain-specific vocabularies • Provide flexible search/visualization tools able to exploit the index 4
  • 5. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5 The ISSA pipeline
  • 6. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France OpenArchive ISSA Pipeline User Communities DEFINE Step 1. Retrieval of metadata records
  • 7. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 ISSA Pipeline User Communities DEFINE Step 1. Retrieval of metadata records What metadata ? • Title • Authors (strings) • Date • Publication • Languages • Identifiers • Abstract • License • URL of the PDF file • … OAI-PMH protocol: • Supported by many open libraries & archives (70% [1]) • Harvested by aggregators e.g. Google Scholar, OpenAIRE [1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional Repositories. 10.1201/9781315155890-5.
  • 8. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Virtuoso Triple Store 2 Translation to RDF ISSA Pipeline User Communities DEFINE QUERY Step 2. Populate the knowledge graph with metadata Metadata RDF representation with standard vocabularies: Dublin Core, BIBO, FABIO/FRBR, EPRINT, FOAF, PROVO, Schema.org (Morph-xR2RML)
  • 9. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 Translation to RDF ISSA Pipeline User Communities DEFINE QUERY Step 3. Full text extraction (GROBID)
  • 10. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri ISSA Pipeline User Communities DEFINE QUERY Step 4. Indexing and NEs extractions ANNOTATE & VALIDATE
  • 11. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Find out descriptors that characterize publications  Rely on the Annif open-source indexating p/f  AGROVOC thesaurus  Training corpus: Agritrop subset + expert descriptors  Evaluation of different classification models 11 Thematic & geographic indexing Structured text Structured text
  • 12. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Annotate parts of text with referring to concepts from controlled vocabularies:  Wikidata  Geonames (through Wikidata)  DBpedia  AGROVOC 12 NEs extraction and linking
  • 13. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri Translation to RDF 5 ISSA Pipeline User Communities DEFINE QUERY ANNOTATE & VALIDATE Step 5. Populate the knowledge graph with descriptors and NEs (Morph-xR2RML) Web Annotation Vocabulary
  • 14. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri Translation to RDF 5 Mining & Visualization Association rules mining Augmented visualization 6 ISSA Pipeline User Communities DEFINE QUERY ANNOTATE & VALIDATE DEFINE & USE Step 6. Mining and Visualization
  • 15. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15 Mining & Visualization
  • 16. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore descriptors association rules 16 Extract and visualize association rules between articles’ descriptors with ARViz. Suited for the discovery of (possibly unexpected) associations
  • 17. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 17 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 18. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 18 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 19. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 19 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 20. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 20 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 21. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore networks of articles, descriptors… Same tools to explore: • Network of articles with co-authors • Network of authors with co-publications • Networks of institutions with same research topics • …
  • 22. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Quick summary • Pipeline and visualization tools successfully deployed for Agritrop • 100,000+ articles’ metadata and abstract • 12,000 OA articles with full text • Pipeline for Agritrop ready to transfer to other archives with limited work • Only open licenses (code, documentation…) • Based on OS, robust tools and technologies, Docker-based • Extensible with new steps following simple guidelines 22
  • 23. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Perspectives https://unsplash.com/photos/ROOrGTNurYI
  • 24. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Perspectives https://unsplash.com/photos/ROOrGTNurYI CIRAD willing to deploy the ISSA pipeline and visualization tools in production for all users of Agritrop.
  • 25. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France ISSA 2 – CfP CollEx-Persée 2021-2022 Exploit & expand the results of ISSA: ◦ Extract new knowledge: relationships between NEs, authors disambiguation, cross references… Link to taxonomic registries? ◦ Broaden the service offering for researchers and documentalists: semantic search, geographical visualization, bibliometry ◦ Non-supervised indexing + improve data quality metrics Extend the PoC to the HAL instance of EuroMov Digital Health in Motion 25
  • 26. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Thank-you https://issa.cirad.fr/ https://github.com/issa-project @ProjetISSA