Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
The project re3data.org–Registry of Research Data Repositories–has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape.
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
The presentation investigates the challenges that we must face to share scientific datasets on the Web following the Linked Open Data principles. We present the standards of the Semantic Web and investigate how they can help address those challenges. We give tips as to how to choose vocabularies to describe data and metadata, link datasets to other related datasets by making appropriate alignments, translate existing data sources to RDF and publish it on the Web as linked data.
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...Open Science Fair
Eloy Rodrigues, Petr Knoth & Kathleen Shearer showcase the conceptual model for this vision, as well as the role and functions of repositories within this model.
Workshop title: Building a global knowledge commons - ramping up repositories to support widespread change in the ecosystem
Workshop abstract:
The extensive international deployment of repository systems in higher education and research institutions, as well as scholarly communities, provides the foundation for a distributed, globally networked infrastructure for scholarly communication. This distributed network of repositories can and should be a powerful tool to promote the transformation of the scholarly communication ecosystem. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices. In April 2016, the Confederation of Open Access Repositories (COAR) launched a working group to help identify new functionalities and technologies for repositories and develop a road map for their adoption. For the past several months, the group has been working to define a vision for repositories and sketch out the priority user stories and scenarios that will help guide the development of new functionalities. The results of this work will be available in the summer of 2017.
This workshop will present the functionalities and technologies for the next generation of repositories and reflect on how these functionalities will be adopted into the existing software platforms. In addition, participants will discuss the important implications for the network layers, and how repositories will uniformly interact with the networks to provide value added services on top of their content.
DAY 3 - PARALLEL SESSION 6 & 7
http://www.opensciencefair.eu/workshops/parallel-day-3-1/building-a-global-knowledge-commons-ramping-up-repositories-to-support-widespread-change-in-the-ecosystem
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
Stronger together: community initiatives in journal managementJisc
There has been a recent growth of initiatives to address common problems regarding current and long-term access to e-journal content. Jisc is at the forefront of many of these with the close participation and active input of educational institutions.
This session aims to summarise the current state of key themes with pointers to future directions of areas such as sustainability, the move towards e-only environments, and shared consortia approaches. It will provide an overview and panel discussion on developing the supporting infrastructure to meet the needs of users. The discussion will focus on how institutions, community bodies and service providers can best work together to ensure sustainable, long-term initiatives by seeking to introduce uniformity, standardisation and collaboration to an even greater extent.
The session will introduce two new Jisc-supported projects in this area, the Keepers Registry Extra and SafeNet initiatives, and discuss how these fit alongside existing Jisc services such as Knowledge Base+, UK LOCKSS Alliance, Journal Archives and JUSP (Journal Usage Statistics Portal). The panel will address how this catalogue of services contributes towards a coherent strategy in the management of e-journal content.
As the volume and complexity of data from myriad Earth Observing platforms, both remote sensing and in-situ increases so does the demand for access to both data and information products from these data. The audience no longer is restricted to an investigator team with specialist science credentials. Non-specialist users from scientists from other disciplines, science-literate public, to teachers, to the general public and decision makers want access. What prevents them from this access to resources? It is the very complexity and specialist developed data formats, data set organizations and specialist terminology. What can be done in response? We must shift the burden from the user to the data provider. To achieve this our developed data infrastructures are likely to need greater degrees of internal code and data structure complexity to achieve (relatively) simpler end-user complexity. Evidence from numerous technical and consumer markets supports this scenario. We will cover the elements of modern data environments, what the new use cases are and how we can respond to them.
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Franck Michel
Slides of a presentation I gave at the TDWG 2020 conference.
Paper: https://doi.org/10.3897/biss.4.59046
Video: https://www.youtube.com/watch?v=KiAgTWpEkHE
The project re3data.org–Registry of Research Data Repositories–has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape.
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
The presentation investigates the challenges that we must face to share scientific datasets on the Web following the Linked Open Data principles. We present the standards of the Semantic Web and investigate how they can help address those challenges. We give tips as to how to choose vocabularies to describe data and metadata, link datasets to other related datasets by making appropriate alignments, translate existing data sources to RDF and publish it on the Web as linked data.
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...Open Science Fair
Eloy Rodrigues, Petr Knoth & Kathleen Shearer showcase the conceptual model for this vision, as well as the role and functions of repositories within this model.
Workshop title: Building a global knowledge commons - ramping up repositories to support widespread change in the ecosystem
Workshop abstract:
The extensive international deployment of repository systems in higher education and research institutions, as well as scholarly communities, provides the foundation for a distributed, globally networked infrastructure for scholarly communication. This distributed network of repositories can and should be a powerful tool to promote the transformation of the scholarly communication ecosystem. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices. In April 2016, the Confederation of Open Access Repositories (COAR) launched a working group to help identify new functionalities and technologies for repositories and develop a road map for their adoption. For the past several months, the group has been working to define a vision for repositories and sketch out the priority user stories and scenarios that will help guide the development of new functionalities. The results of this work will be available in the summer of 2017.
This workshop will present the functionalities and technologies for the next generation of repositories and reflect on how these functionalities will be adopted into the existing software platforms. In addition, participants will discuss the important implications for the network layers, and how repositories will uniformly interact with the networks to provide value added services on top of their content.
DAY 3 - PARALLEL SESSION 6 & 7
http://www.opensciencefair.eu/workshops/parallel-day-3-1/building-a-global-knowledge-commons-ramping-up-repositories-to-support-widespread-change-in-the-ecosystem
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
Stronger together: community initiatives in journal managementJisc
There has been a recent growth of initiatives to address common problems regarding current and long-term access to e-journal content. Jisc is at the forefront of many of these with the close participation and active input of educational institutions.
This session aims to summarise the current state of key themes with pointers to future directions of areas such as sustainability, the move towards e-only environments, and shared consortia approaches. It will provide an overview and panel discussion on developing the supporting infrastructure to meet the needs of users. The discussion will focus on how institutions, community bodies and service providers can best work together to ensure sustainable, long-term initiatives by seeking to introduce uniformity, standardisation and collaboration to an even greater extent.
The session will introduce two new Jisc-supported projects in this area, the Keepers Registry Extra and SafeNet initiatives, and discuss how these fit alongside existing Jisc services such as Knowledge Base+, UK LOCKSS Alliance, Journal Archives and JUSP (Journal Usage Statistics Portal). The panel will address how this catalogue of services contributes towards a coherent strategy in the management of e-journal content.
As the volume and complexity of data from myriad Earth Observing platforms, both remote sensing and in-situ increases so does the demand for access to both data and information products from these data. The audience no longer is restricted to an investigator team with specialist science credentials. Non-specialist users from scientists from other disciplines, science-literate public, to teachers, to the general public and decision makers want access. What prevents them from this access to resources? It is the very complexity and specialist developed data formats, data set organizations and specialist terminology. What can be done in response? We must shift the burden from the user to the data provider. To achieve this our developed data infrastructures are likely to need greater degrees of internal code and data structure complexity to achieve (relatively) simpler end-user complexity. Evidence from numerous technical and consumer markets supports this scenario. We will cover the elements of modern data environments, what the new use cases are and how we can respond to them.
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Franck Michel
Slides of a presentation I gave at the TDWG 2020 conference.
Paper: https://doi.org/10.3897/biss.4.59046
Video: https://www.youtube.com/watch?v=KiAgTWpEkHE
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
Presentation of an article published at the 2nd International Workshop on Semantics for Biodiversity (S4Biodiv 2017), co-located with ISWC2017.
Article: https://hal.archives-ouvertes.fr/hal-01617708
Taxonomic registers are key tools to help us comprehend the diversity of nature. Publishing such registers in the Web of Data, following the standards and best practices of Linked Open Data (LOD), is a way of integrating multiple data sources into a world-scale, biological knowledge base. In this pa-per, we present an on-going work aimed at the publication of TAXREF, the French national taxonomic register, on the Web of Data. Far beyond the mere translation of the TAXREF database into LOD standards, we show that the key point of this endeavor is the design of a model capable of capturing the two coexisting yet distinct realities underlying taxonomic registers, namely the nomenclature (the rules for naming biological entities) and the taxonomy (the description and characterization of these biological entities). We first analyze different modelling choices made to represent some international taxonomic registers as LOD, and we underline the issues that arise from these differences. Then, we propose a model aimed to tackle these is-sues. This model separates nomenclature from taxonomy, it is flexible enough to accommodate the ever-changing scientific consensus on taxonomy, and it adheres to the philosophy underpinning the Semantic Web standards. Finally, using the example of TAXREF, we show that the model enables interlinking with third-party LOD data sets, may they represent nomenclatural or taxonomic information.
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataFranck Michel
Presentation of an article published at the 11th workshop on Linked Data on the Web (LDOW2018), co-located with the Web Conference 2018.
Article: https://hal.archives-ouvertes.fr/hal-01722792
We hypothesize that harnessing the Semantic Web standards to enable automatic combination of Linked Data and non-RDF Web APIs data could trigger novel cross-fertilization scenarios.
To achieve this goal, we define the SPARQL Micro-Service architecture. A SPARQL micro-service is a lightweight, task-specific SPARQL endpoint that provides access to a small, resource-centric, virtual graph, while dynamically assigning dereferenceable URIs to Web API resources that do not have URIs beforehand. The graph is delineated by the Web API service being wrapped, the arguments passed to this service, and the restricted types of RDF triples that this SPARQL micro-service is designed to spawn. In this context, we argue that full SPARQL expressiveness can be supported efficiently without jeopardizing servers availability. Eventually, we believe that an ecosystem of SPARQL micro-services could emerge from independent service providers, enabling Linked Data-based applications to glean pieces of data from a wealth of distributed, scalable and reliable services. We describe an experimentation where we dynamically augment biodiversity-related Linked Data with data from Flickr, MusicBrainz and the Macauley scientific media library.
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Franck Michel
Conférence SemWeb.Pro, 21/11/2016.
Une des missions du Muséum National d’Histoire Naturelle (MNHN) est d’établir une synthèse de la biodiversité et du patrimoine naturel français. Dans ce contexte, il est en charge de l’élaboration d’un référentiel taxonomique pour la faune, la flore et la fonge, TAXREF. Ce référentiel unique liste et organise les noms scientifiques de l'ensemble des êtres vivants recensés sur les territoires français, métropole et outremer, et constitue la pierre angulaire du Système d’Information sur la Nature et les Paysages (SINP). Il est utilisé par de nombreux acteurs publics, privés et de la société civile (collectivités, conservateurs, cabinets d’architecte, enseignants, citoyens, etc.). TAXREF est de plus aligné avec d'autres référentiels taxonomiques ou nomenclaturaux internationaux.
Le projet de recherche Zoomathia vise à étudier l’histoire de la connaissance zoologique à travers l’Antiquité et le Moyen-Age. Pour cela, il envisage d’utiliser les technologies du web sémantique afin d’intégrer des sources de données hétérogènes, allant d’encyclopédies médiévales à des données de biologie moderne, en passant par des rapports de fouilles archéologiques et des ressources iconographiques. Ce travail passe nécessairement par la sélection et/ou la définition de vocabulaires pouvant servir de référentiels taxonomique, culturel, géographique, chronologique etc. Afin de rendre les données intégrées interopérables sur le web, ces vocabulaires doivent faire l’objet d’un consensus et être liés à d’autres vocabulaires connexes faisant autorité. TAXREF étant le résultat d’un large consensus scientifique, et étant déjà utilisé pour l’intégration de données de biologie moderne et de données archéologiques, il a été sélectionné pour construire un thésaurus supportant l’intégration des données considérées dans le cadre du projet Zoomathia.
Dans cette présentation, je reviendrai sur les motivations exposées ci-dessus, puis je décrirai la modélisation d’un thésaurus exprimé en SKOS (Simple Knowledge Organisation System) afin de produire une version de TAXREF exploitable avec les technologies du web sémantique. J’aborderai notamment la question du lien entre ce « TAXREF-SKOS » et d’autres thésaurus et ontologies existantes. Enfin, je décrirai la méthode utilisée pour produire le résultat en RDF et son exposition sur le web de données sous forme d’URI pérennes déréférençables, et je ferai une courte démonstration via la navigation dans les URI en Linked Data et l’utilisation de requêtes SPARQL. En conclusion je reviendrai sur le fait que la construction de thésaurus SKOS n’est qu’une étape, un « enabler », visant à encourager les producteurs de données utilisant déjà TAXREF, et les concepteurs d’applications, à utiliser ces technologies et s’appuyer sur TAXREF-SKOS.
A Mapping-based Method to Query MongoDB Documents with SPARQLFranck Michel
Accessing legacy data as virtual RDF stores is a key issue in the building of the Web of Data. In recent years, the MongoDB database has become a popular actor in the NoSQL market, making it a significant potential contributor to the Web of Linked Data. Therefore, in this talk we present an article published at the DEXA 2016 conference. It addresses the question of how to access arbitrary MongoDB documents with SPARQL.
We propose a two-step method to (i) translate a SPARQL query into a pivot abstract query under MongoDB-to-RDF mappings represented in the xR2RML language, then (ii) translate the pivot query into a concrete MongoDB query.
We elaborate on the discrepancy between the expressiveness of SPARQL and the MongoDB query language, and we show that we can always come up with a rewriting that shall produce all correct answers.
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Franck Michel
Presentateion of a collective article we submited at the First Semantic Web for Scientific History workshop (SW4SH) co-located with ESWC 2015.
Link to the article: https://hal.archives-ouvertes.fr/hal-01146638v1
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Software Variability and Frictionless Reproducibility
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scientists Search and Make Sense of a Scientific Archive
1. * Wimmics: AI in bridging social semantics and formal semantics on the Web
Franck MICHEL* - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA: Generic Pipeline,
Knowledge Model and
Visualization tools to
Help Scientists Search and
Make Sense of a Scientific Archive
2. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Issue: skyrocketing pace of publications
Bibliographic search difficult:
• Find and make sense of relevant articles
• Search across multiple disciplines
Central role of open scientific archives
But the provided services have limitations:
• String-based search fails to grasp semantic relationships
• Keywords often too general to be helpful
Need for smarter search services exploiting this knowledge
2
Open Science
3. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 3
Propose a generic, reusable, extensible
solution to optimize bibliographic search
in an open scientific archive.
4. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How did we do that?
• Extract rich metadata from the publications
in multiple languages
• Turn it into a semantic index published
on the web as a RDF knowledge graph
• Link with general vocabularies as well as
domain-specific vocabularies
• Provide flexible search/visualization tools
able to exploit the index
4
5. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 5
The ISSA
pipeline
6. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
OpenArchive
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
7. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
ISSA
Pipeline
User Communities
DEFINE
Step 1. Retrieval of metadata records
What metadata ?
• Title
• Authors (strings)
• Date
• Publication
• Languages
• Identifiers
• Abstract
• License
• URL of the PDF file
• …
OAI-PMH protocol:
• Supported by many open
libraries & archives (70% [1])
• Harvested by aggregators
e.g. Google Scholar,
OpenAIRE
[1] Ramírez-Montoya, María-Soledad & Ceballos, Hector. (2017). Institutional
Repositories. 10.1201/9781315155890-5.
8. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 2. Populate the knowledge graph with metadata
Metadata RDF representation with standard vocabularies:
Dublin Core, BIBO, FABIO/FRBR,
EPRINT, FOAF, PROVO, Schema.org
(Morph-xR2RML)
9. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
Translation
to RDF
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 3. Full text extraction
(GROBID)
10. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Thematic & geographic Indexing (Annif)
NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
ISSA
Pipeline
User Communities
DEFINE QUERY
Step 4. Indexing and NEs extractions
ANNOTATE
& VALIDATE
11. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Find out descriptors that
characterize publications
Rely on the Annif open-source
indexating p/f
AGROVOC thesaurus
Training corpus: Agritrop
subset + expert descriptors
Evaluation of different
classification models
11
Thematic &
geographic indexing
Structured text Structured text
12. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Annotate parts of text with
referring to concepts from
controlled vocabularies:
Wikidata
Geonames (through Wikidata)
DBpedia
AGROVOC
12
NEs extraction
and linking
13. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thematic & geographic Indexing (Annif)
NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
Step 5. Populate the knowledge graph with
descriptors and NEs
(Morph-xR2RML)
Web Annotation Vocabulary
14. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thematic & geographic Indexing (Annif)
NEs extraction & linking (Entity-fishing, Spotlight, Dictionary)
Retrieval
(OAI-PMH)
OpenArchive Metadata records
1
Full text
extraction 3
< / >
< / >
< / >
Structured text
Virtuoso
Triple Store
2
4
Linked Descriptors and Named Entities
Translation
to RDF
Vocabularies & Datasets
Wikidata, DBpedia, Geonames,
domain thesauri
Translation to RDF
5
Mining & Visualization
Association rules mining
Augmented visualization
6
ISSA
Pipeline
User Communities
DEFINE QUERY
ANNOTATE
& VALIDATE
DEFINE & USE
Step 6. Mining and Visualization
15. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France 15
Mining & Visualization
16. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore descriptors association rules
16
Extract and visualize
association rules between
articles’ descriptors
with ARViz.
Suited for the discovery
of (possibly unexpected)
associations
17. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
17
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
18. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
18
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
19. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
19
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
20. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore/navigate networks of entities
20
Solve complex competency questions by visually exploring networks of
descriptors, authors, articles with LDViz.
21. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Explore networks of articles, descriptors…
Same tools to explore:
• Network of articles with
co-authors
• Network of authors with
co-publications
• Networks of institutions
with same research topics
• …
22. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Quick
summary
• Pipeline and visualization tools successfully
deployed for Agritrop
• 100,000+ articles’ metadata and abstract
• 12,000 OA articles with full text
• Pipeline for Agritrop ready to transfer
to other archives with limited work
• Only open licenses (code, documentation…)
• Based on OS, robust tools and technologies,
Docker-based
• Extensible with new steps following simple
guidelines
22
23. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
24. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Perspectives
https://unsplash.com/photos/ROOrGTNurYI
CIRAD willing to deploy the ISSA pipeline and
visualization tools in production for all users of Agritrop.
25. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
ISSA 2 – CfP CollEx-Persée 2021-2022
Exploit & expand the results of ISSA:
◦ Extract new knowledge: relationships between NEs,
authors disambiguation, cross references… Link to taxonomic registries?
◦ Broaden the service offering for researchers and documentalists:
semantic search, geographical visualization, bibliometry
◦ Non-supervised indexing + improve data quality metrics
Extend the PoC to the HAL instance of EuroMov Digital Health in Motion
25
26. Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thank-you
https://issa.cirad.fr/
https://github.com/issa-project
@ProjetISSA