SlideShare a Scribd company logo
1 of 26
Download to read offline
* Wimmics: AI in bridging social semantics and formal semantics on the Web
Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA: Generic Pipeline,
Knowledge Model and
Visualization tools to
Help Scientists Search and
Make Sense of a Scientific Archive
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Issue: skyrocketing pace of publications
Bibliographic search difficult:
• Find and make sense of relevant articles
• Search across multiple disciplines
Central role of open scientific archives
But the provided services have limitations:
• String-based search fails to grasp semantic relationships
• Keywords often too general to be helpful
 Need for smarter search services exploiting this knowledge
2
Open Science
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3
Propose a generic, reusable, extensible
solution to optimize bibliographic search
in an open scientific archive.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How did we do that?
• Extract rich metadata from the publications
in multiple languages
• Turn it into a semantic index published
on the web as a RDF knowledge graph
• Link with general vocabularies as well as
domain-specific vocabularies
• Provide flexible search/visualization tools
able to exploit the index
4
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5
The ISSA
pipeline
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
OpenArchive
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
What metadata ?
• Title
• Authors (strings)
• Date
• Publication
• Languages
• Identifiers
• Abstract
• License
• URL of the PDF file
• …
OAI-PMH protocol:
• Supported by many open
libraries & archives (70% [1])
• Harvested by aggregators
e.g. Google Scholar,
OpenAIRE
[1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional
Repositories. 10.1201/9781315155890-5.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 2. Populate the knowledge graph with metadata
Metadata RDF representation with standard vocabularies:
Dublin Core, BIBO, FABIO/FRBR,
EPRINT, FOAF, PROVO, Schema.org
(Morph-xR2RML)
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 3. Full text extraction
(GROBID)
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 4. Indexing and NEs extractions
ANNOTATE
& VALIDATE
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Find out descriptors that
characterize publications
 Rely on the Annif open-source
indexating p/f
 AGROVOC thesaurus
 Training corpus: Agritrop
subset + expert descriptors
 Evaluation of different
classification models
11
Thematic &
geographic indexing
Structured text Structured text
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Annotate parts of text with
referring to concepts from
controlled vocabularies:
 Wikidata
 Geonames (through Wikidata)
 DBpedia
 AGROVOC
12
NEs extraction
and linking
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
Step 5. Populate the knowledge graph with
descriptors and NEs
(Morph-xR2RML)
Web Annotation Vocabulary
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
 Thematic & geographic Indexing (Annif)
 NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
Mining & Visualization
Association rules mining
Augmented visualization
6
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
DEFINE & USE
Step 6. Mining and Visualization
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15
Mining & Visualization
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore descriptors association rules
16
Extract and visualize
association rules between
articles’ descriptors
with ARViz.
Suited for the discovery
of (possibly unexpected)
associations
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
17
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
18
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
19
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
20
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore networks of articles, descriptors…
Same tools to explore:
• Network of articles with
co-authors
• Network of authors with
co-publications
• Networks of institutions
with same research topics
• …
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Quick
summary
• Pipeline and visualization tools successfully
deployed for Agritrop
• 100,000+ articles’ metadata and abstract
• 12,000 OA articles with full text
• Pipeline for Agritrop ready to transfer
to other archives with limited work
• Only open licenses (code, documentation…)
• Based on OS, robust tools and technologies,
Docker-based
• Extensible with new steps following simple
guidelines
22
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
CIRAD willing to deploy the ISSA pipeline and
visualization tools in production for all users of Agritrop.
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA 2 – CfP CollEx-Persée 2021-2022
Exploit & expand the results of ISSA:
◦ Extract new knowledge: relationships between NEs,
authors disambiguation, cross references… Link to taxonomic registries?
◦ Broaden the service offering for researchers and documentalists:
semantic search, geographical visualization, bibliometry
◦ Non-supervised indexing + improve data quality metrics
Extend the PoC to the HAL instance of EuroMov Digital Health in Motion
25
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thank-you
https://issa.cirad.fr/
https://github.com/issa-project
@ProjetISSA

More Related Content

Similar to ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive

OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
Figoblog
 

Similar to ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive (20)

Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
Ontology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortalOntology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortal
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
Semantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretationSemantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretation
 
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
 
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Overview
OverviewOverview
Overview
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal management
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Open Archives & Open Access
Open Archives & Open AccessOpen Archives & Open Access
Open Archives & Open Access
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...
 

More from Franck Michel

A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
Franck Michel
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
Franck Michel
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Franck Michel
 

More from Franck Michel (12)

Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 

ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive

  • 1. * Wimmics: AI in bridging social semantics and formal semantics on the Web Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive
  • 2. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Issue: skyrocketing pace of publications Bibliographic search difficult: • Find and make sense of relevant articles • Search across multiple disciplines Central role of open scientific archives But the provided services have limitations: • String-based search fails to grasp semantic relationships • Keywords often too general to be helpful  Need for smarter search services exploiting this knowledge 2 Open Science
  • 3. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3 Propose a generic, reusable, extensible solution to optimize bibliographic search in an open scientific archive.
  • 4. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France How did we do that? • Extract rich metadata from the publications in multiple languages • Turn it into a semantic index published on the web as a RDF knowledge graph • Link with general vocabularies as well as domain-specific vocabularies • Provide flexible search/visualization tools able to exploit the index 4
  • 5. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5 The ISSA pipeline
  • 6. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France OpenArchive ISSA Pipeline User Communities DEFINE Step 1. Retrieval of metadata records
  • 7. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 ISSA Pipeline User Communities DEFINE Step 1. Retrieval of metadata records What metadata ? • Title • Authors (strings) • Date • Publication • Languages • Identifiers • Abstract • License • URL of the PDF file • … OAI-PMH protocol: • Supported by many open libraries & archives (70% [1]) • Harvested by aggregators e.g. Google Scholar, OpenAIRE [1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional Repositories. 10.1201/9781315155890-5.
  • 8. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Virtuoso Triple Store 2 Translation to RDF ISSA Pipeline User Communities DEFINE QUERY Step 2. Populate the knowledge graph with metadata Metadata RDF representation with standard vocabularies: Dublin Core, BIBO, FABIO/FRBR, EPRINT, FOAF, PROVO, Schema.org (Morph-xR2RML)
  • 9. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 Translation to RDF ISSA Pipeline User Communities DEFINE QUERY Step 3. Full text extraction (GROBID)
  • 10. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri ISSA Pipeline User Communities DEFINE QUERY Step 4. Indexing and NEs extractions ANNOTATE & VALIDATE
  • 11. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Find out descriptors that characterize publications  Rely on the Annif open-source indexating p/f  AGROVOC thesaurus  Training corpus: Agritrop subset + expert descriptors  Evaluation of different classification models 11 Thematic & geographic indexing Structured text Structured text
  • 12. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Annotate parts of text with referring to concepts from controlled vocabularies:  Wikidata  Geonames (through Wikidata)  DBpedia  AGROVOC 12 NEs extraction and linking
  • 13. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri Translation to RDF 5 ISSA Pipeline User Communities DEFINE QUERY ANNOTATE & VALIDATE Step 5. Populate the knowledge graph with descriptors and NEs (Morph-xR2RML) Web Annotation Vocabulary
  • 14. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France  Thematic & geographic Indexing (Annif)  NEs extraction & linking (Entity-fishing, Spotlight, Dictionary) Retrieval (OAI-PMH) OpenArchive Metadata records 1 Full text extraction 3 < / > < / > < / > Structured text Virtuoso Triple Store 2 4 Linked Descriptors and Named Entities Translation to RDF Vocabularies & Datasets Wikidata, DBpedia, Geonames, domain thesauri Translation to RDF 5 Mining & Visualization Association rules mining Augmented visualization 6 ISSA Pipeline User Communities DEFINE QUERY ANNOTATE & VALIDATE DEFINE & USE Step 6. Mining and Visualization
  • 15. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15 Mining & Visualization
  • 16. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore descriptors association rules 16 Extract and visualize association rules between articles’ descriptors with ARViz. Suited for the discovery of (possibly unexpected) associations
  • 17. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 17 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 18. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 18 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 19. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 19 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 20. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore/navigate networks of entities 20 Solve complex competency questions by visually exploring networks of descriptors, authors, articles with LDViz.
  • 21. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Explore networks of articles, descriptors… Same tools to explore: • Network of articles with co-authors • Network of authors with co-publications • Networks of institutions with same research topics • …
  • 22. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Quick summary • Pipeline and visualization tools successfully deployed for Agritrop • 100,000+ articles’ metadata and abstract • 12,000 OA articles with full text • Pipeline for Agritrop ready to transfer to other archives with limited work • Only open licenses (code, documentation…) • Based on OS, robust tools and technologies, Docker-based • Extensible with new steps following simple guidelines 22
  • 23. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Perspectives https://unsplash.com/photos/ROOrGTNurYI
  • 24. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Perspectives https://unsplash.com/photos/ROOrGTNurYI CIRAD willing to deploy the ISSA pipeline and visualization tools in production for all users of Agritrop.
  • 25. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France ISSA 2 – CfP CollEx-Persée 2021-2022 Exploit & expand the results of ISSA: ◦ Extract new knowledge: relationships between NEs, authors disambiguation, cross references… Link to taxonomic registries? ◦ Broaden the service offering for researchers and documentalists: semantic search, geographical visualization, bibliometry ◦ Non-supervised indexing + improve data quality metrics Extend the PoC to the HAL instance of EuroMov Digital Health in Motion 25
  • 26. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Thank-you https://issa.cirad.fr/ https://github.com/issa-project @ProjetISSA