The document describes a prototype that retrieves related scientific publications from different linked datasets through thesaurus alignment. It introduces several linked datasets, including Agrovoc, OpenAgris, STW and EconStor. The prototype matches concepts from a user query to concepts in the linked datasets' thesauri to identify related publications. Pseudocode is provided to illustrate the process of concept mapping and querying multiple datasets. The goal is to retrieve relevant publications from different sources through a single interface.
Webinar@AIMS: Perspective on Big Data in the CGIAR
Scientific Publication Retrieval in Linked Data
1. Scientific Publication
Retrieval in Linked Data
Lim Ying Sean1, Arun Anand Sadanandan1, Dickson Lukose1 and
Klaus Tochtermann2
1 MIMOS Bhd, Kuala Lumpur, Malaysia,
2 Leibniz Information Centre for Economics (ZBW), Kiel, Germany
3. Introduction
• Linked Data allows libraries to create and deliver library
data that is sharable, extensible and easily re-usable.
• Through rich linkages with complementary data from
trusted sources, libraries can increase the value of their
library data beyond the sum of their sources taken
individually.
5. Introduction
• Our goal is to identify and retrieve related
scientific publications from different Linked
Datasets published, from a single user interface.
• Scientific publications in Linked Data consist of 3
elements:
– Metadata
– Thesauri
– Name authority file
6. Linked Datasets
• Agrovoc
• OpenAgris
• Standard Thesaurus Wirtschaft (STW) - a thesaurus for
economics.
• EconStor - an open access server for free publication of academic
literature in economics.
12. Future Work
• In order to improve the quality of retrieved
publications, there are some future
research works are required:
– Measure the relevancy of the related
publications.
– Enhance user experience when searching for
related publications.
13. Conclusion
• We have illustrated our work in consuming linked
datasets in the area of publications, and in particular we
described the process followed to retrieve related
publications from different linked datasets.
• The approach we adopted depends on thesaurus
alignment to retrieve related publications.
14.
15. Pseudocode 1
Input: User input query
Output: List of publication from local dataset
Procedure:
1. Identify concepts from user query, Cn={C1,C2,….,Cn};
2. Initialize the current concept pointer i;
3. Initialize an array of concepts Wn = {};
4. while (i<n) do
5. if Ci has skos:narrower to another concept, S then
6. load concept S into the array, Wn= {S1};
7. else
8. load concept Ci into the array, Wn = {S1,Ci…..};
9. end if
10. Increase the current concept pointer i;
11. end while
12. Issues a SPARQL to local dataset to identify publications that consist of
concept Wn.
16. Pseudocode 2
Input: List of concept
Output: List of publication from Linked Data
Procedure:
1. Receive an array of concept An={C1,C2,…Cn};
2. Initialize a 2 dimensions array S[src][j];
3. Initialize the current concept pointer i;
4. while (i<n) do
5. if Ci has skos:exactMatch to another concept, SCONCEPT then
6. Identify the source for SCONCEPT, src;
7. S[src][j] SCONCEPT;
8. end if
9. Increase the current concept pointer i;
10. end while
11. Issues a SPARQL to Linked Data S[src] to identify publication that consist of
concept Sn, Sn∈ S[src][j].