ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

Improving Editorial Workflow and Metadata
Quality at Springer Nature
Angelo Salatino1, Francesco Osborne1,
Aliaksandr Birukou2, Enrico Motta1
1
Knowledge Media Institute, The Open University, United Kingdom
2
Springer Nature, Heidelberg, Germany
ISWC 2019

Open University and Springer Nature Collaboration
The Open University and Springer Nature have been collaborating since 2014 in
the development of an array of semantically-enhanced solutions for:
Osborne et al. (2017) Supporting Springer Nature Editors by means of Semantic Technologies. ISWC 2017. Vienna, Austria.
• Semi-automatic classification of proceedings
and other editorial products.
• Automatic selection of the most appropriate
books, journals, and proceedings to market at a
scientific event.
• Analysis of SN codes, with the aim of evolving
marked codes and detecting fields that deserve
further attention.
• Joint release of the Computer Science Ontology.

Generation of Metadata
It is a crucial task to enable scholars, students, companies and other stakeholders to
discover and access this knowledge.
Traditionally, editors choose a list of related
keywords and categories in relevant taxonomies
according to:
• their own experience of similar conferences;
• a visual exploration of titles and abstracts;
• a list of terms given by the curators or derived
by calls for papers.

Classification of Publications – A Complex Problem
Classify publications manually presents a number of issues for
a large editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale
• It is easy to miss the emergence of new topics
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.

Smart Topic Miner 1.0 - 2016
Presented at ISWC 2016
Osborne, F., Salatino, A., Birukou, A. and Motta,
E.: Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner. ISWC 2016

A success story
• Since 2016 STM had been regularly used by editors in Germany,
China, Brazil, India, and Japan.
• It is used to classify more than 800 conference proceedings
volume per year including the Lecture Notes in Computer Science
(LNCS) as well as LNBIP, CCIS, IFIP-AICT, LNICST.
• It changed completely SN internal workflow: now the task is semi-
automatic and monitored by junior editors.
• It is constantly evolving and including new functionalities,
following the feedback from the editorial team.

Business Value
• STM halves the time needed for classifying proceedings from
30 to 15 minutes.
• It allows also junior editors to work on the classification of
proceedings, distributing the load and reducing costs.
• The adoption of a controlled vocabulary makes the process
more robust and facilitates the identification of related
editorial products.
11

Retrievability
About 9M of additional downloads thanks to STM.
0
5000
10000
15000
20000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Average number of yearly downloads
for books in SpringerLink
downloads (CS Proceedings) expected downloads (CS Proceedings)
downloads (CS Proceedings) withSTM downloads (other books in CS)
downloads (overall)

http://stm-demo.kmi.open.ac.uk
Demo 462

• New GUI.
• New Knowledge Base (CSO).
• New Topic Detection Engine
(CSO Classifier).
• Ability to compare with
previous editions.
• Integrated with SN system
and CSO Portal.
http://stm-demo.kmi.open.ac.uk

SN Editors
HTML - GUI
Parser
Generate
Visualizations
STM Engine
CSO
SNCs
Historical
Data
i) CSO Classifier
ii) Topic Explanation
iii) Taxonomy Generation
iv) SN Tags Inference
v) Previous Classification
word2vec model
STM 2.0 - architecture

A new knowledge base - The Computer Science
Ontology
The Computer Science Ontology (CSO) is a large-scale, automatically generated
ontology of research areas. It is the largest ontology in the field of Computer Science,
including about 14K topics and 162K semantic relationships.
Salatino et al (2019) The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas. Data Intelligence.
http://cso.kmi.open.ac.uk/

A new topic detection engine - The CSO Classifier
The CSO Classifier is a unsupervised approach for automatically classifying documents
according to CSO.
Salatino et al. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles.
https://cso.kmi.open.ac.uk/classify/
https://cso.kmi.open.ac.uk/classify/
https://github.com/angelosalatino/cso-classifier
Download
Demo
pip install cso-classifier

The CSO Classifier - Architecture

Evaluation - Performance
Classifier Description Prec. Rec. F1
TF-IDF TF-IDF 16.7% 24.0% 19.7%
TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1%
LDA100 LDA with 100 topics. 5.9% 11.9% 7.9%
LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6%
W2V-W W2V on windows of words. 41.2% 16.7% 23.8%
STM - 2016 Classifier used by STM 1.0. 80.8% 58.2% 67.6%
STM – 2017 (CSO-SYN) CSO Classifier -Syntactic module. 78.3% 63.8% 70.3%
CSO-SEM CSO Classifier -Semantic module. 70.8% 72.2% 71.5%
STM – 2019 (CSO-C) The CSO Classifier. 73.0% 75.3% 74.1%
Computed on a GS of 70 publications, each annotated by 3 researchers.

Evaluation - Usability
System SUS score Grade Percentile
STM 2016 76.6 B 80%
STM 2019 82.8 A 93%
0 20 40 60 80 100
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Score
0 1 2 3 4 5
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Categories
Want to use frequently Easy to use
Easy to Learn Too complex

Conclusion and Future Work
• “A little semantic goes a long way”
• Semantic explainability is crucial in this domain
• We are working on an application that will support authors in
annotating their own papers.
• Typing of scientific entities: approaches, tasks, domains,
resources.
• Automatic extraction of Scientific Knowledge Graph.

Francesco
Osborne
Angelo
Salatino
Aliaksandr
Birukou
Enrico
Motta
Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic
Classification of Springer Nature Proceedings with Smart Topic
Miner. In ISWC 2016 ). Available at http://rdcu.be/wEHY
Email: francesco.osborne@open.ac.uk
Twitter: FraOsborne
Site: people.kmi.open.ac.uk/francesco
See also

ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

Recommended

Recommended

More Related Content

Similar to ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

Similar to ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature (20)

More from Francesco Osborne

More from Francesco Osborne (7)

Recently uploaded

Recently uploaded (20)

ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

Editor's Notes