SlideShare a Scribd company logo
Improving Editorial Workflow and Metadata
Quality at Springer Nature
Angelo Salatino1, Francesco Osborne1,
Aliaksandr Birukou2, Enrico Motta1
1
Knowledge Media Institute, The Open University, United Kingdom
2
Springer Nature, Heidelberg, Germany
ISWC 2019
Open University and Springer Nature Collaboration
The Open University and Springer Nature have been collaborating since 2014 in
the development of an array of semantically-enhanced solutions for:
Osborne et al. (2017) Supporting Springer Nature Editors by means of Semantic Technologies. ISWC 2017. Vienna, Austria.
• Semi-automatic classification of proceedings
and other editorial products.
• Automatic selection of the most appropriate
books, journals, and proceedings to market at a
scientific event.
• Analysis of SN codes, with the aim of evolving
marked codes and detecting fields that deserve
further attention.
• Joint release of the Computer Science Ontology.
Generation of Metadata
It is a crucial task to enable scholars, students, companies and other stakeholders to
discover and access this knowledge.
Traditionally, editors choose a list of related
keywords and categories in relevant taxonomies
according to:
• their own experience of similar conferences;
• a visual exploration of titles and abstracts;
• a list of terms given by the curators or derived
by calls for papers.
Classification of Publications – A Complex Problem
Classify publications manually presents a number of issues for
a large editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale
• It is easy to miss the emergence of new topics
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.
Smart Topic Miner 1.0 - 2016
Smart Topic Miner 1.0 - 2016
Presented at ISWC 2016
Osborne, F., Salatino, A., Birukou, A. and Motta,
E.: Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner. ISWC 2016
A success story
• Since 2016 STM had been regularly used by editors in Germany,
China, Brazil, India, and Japan.
• It is used to classify more than 800 conference proceedings
volume per year including the Lecture Notes in Computer Science
(LNCS) as well as LNBIP, CCIS, IFIP-AICT, LNICST.
• It changed completely SN internal workflow: now the task is semi-
automatic and monitored by junior editors.
• It is constantly evolving and including new functionalities,
following the feedback from the editorial team.
Smart Topic Miner 1.0 - 2016
Smart Topic Miner 1.2 - 2017
Smart Topic Miner 2.0 - 2019
Business Value
• STM halves the time needed for classifying proceedings from
30 to 15 minutes.
• It allows also junior editors to work on the classification of
proceedings, distributing the load and reducing costs.
• The adoption of a controlled vocabulary makes the process
more robust and facilitates the identification of related
editorial products.
11
Retrievability
About 9M of additional downloads thanks to STM.
0
5000
10000
15000
20000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Average number of yearly downloads
for books in SpringerLink
downloads (CS Proceedings) expected downloads (CS Proceedings)
downloads (CS Proceedings) withSTM downloads (other books in CS)
downloads (overall)
Smart Topic Miner 2.0 - 2019
http://stm-demo.kmi.open.ac.uk
Demo 462
Smart Topic Miner 2.0 - 2019
• New GUI.
• New Knowledge Base (CSO).
• New Topic Detection Engine
(CSO Classifier).
• Ability to compare with
previous editions.
• Integrated with SN system
and CSO Portal.
http://stm-demo.kmi.open.ac.uk
SN Editors
HTML - GUI
Parser
Generate
Visualizations
STM Engine
CSO
SNCs
Historical
Data
i) CSO Classifier
ii) Topic Explanation
iii) Taxonomy Generation
iv) SN Tags Inference
v) Previous Classification
word2vec model
STM 2.0 - architecture
A new knowledge base - The Computer Science
Ontology
The Computer Science Ontology (CSO) is a large-scale, automatically generated
ontology of research areas. It is the largest ontology in the field of Computer Science,
including about 14K topics and 162K semantic relationships.
Salatino et al (2019) The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas. Data Intelligence.
http://cso.kmi.open.ac.uk/
A new topic detection engine - The CSO Classifier
The CSO Classifier is a unsupervised approach for automatically classifying documents
according to CSO.
Salatino et al. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles.
https://cso.kmi.open.ac.uk/classify/
https://cso.kmi.open.ac.uk/classify/
https://github.com/angelosalatino/cso-classifier
Download
Demo
pip install cso-classifier
The CSO Classifier - Architecture
Evaluation - Performance
Classifier Description Prec. Rec. F1
TF-IDF TF-IDF 16.7% 24.0% 19.7%
TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1%
LDA100 LDA with 100 topics. 5.9% 11.9% 7.9%
LDA500 LDA with 500 topics. 4.2% 12.5% 6.3%
LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3%
LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6%
LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2%
LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7%
W2V-W W2V on windows of words. 41.2% 16.7% 23.8%
STM - 2016 Classifier used by STM 1.0. 80.8% 58.2% 67.6%
STM – 2017 (CSO-SYN) CSO Classifier -Syntactic module. 78.3% 63.8% 70.3%
CSO-SEM CSO Classifier -Semantic module. 70.8% 72.2% 71.5%
STM – 2019 (CSO-C) The CSO Classifier. 73.0% 75.3% 74.1%
Computed on a GS of 70 publications, each annotated by 3 researchers.
Evaluation - Usability
System SUS score Grade Percentile
STM 2016 76.6 B 80%
STM 2019 82.8 A 93%
0 20 40 60 80 100
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Score
0 1 2 3 4 5
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Categories
Want to use frequently Easy to use
Easy to Learn Too complex
Conclusion and Future Work
• “A little semantic goes a long way”
• Semantic explainability is crucial in this domain
• We are working on an application that will support authors in
annotating their own papers.
• Typing of scientific entities: approaches, tasks, domains,
resources.
• Automatic extraction of Scientific Knowledge Graph.
Francesco
Osborne
Angelo
Salatino
Aliaksandr
Birukou
Enrico
Motta
Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic
Classification of Springer Nature Proceedings with Smart Topic
Miner. In ISWC 2016 ). Available at http://rdcu.be/wEHY
Email: francesco.osborne@open.ac.uk
Twitter: FraOsborne
Site: people.kmi.open.ac.uk/francesco
See also

More Related Content

Similar to ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
AIMS (Agricultural Information Management Standards)
 
SciVerse @ TJU
SciVerse @ TJUSciVerse @ TJU
SciVerse @ TJU
rachelmccullough
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
Cornelius Puschmann
 
Publishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it workPublishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it work
Aliaksandr Birukou
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
Martin Voigt
 
Streaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsStreaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and Semantics
Linked Enterprise Date Services
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
semanticsconference
 
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
CaaS EU FP7 Project
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product Overview
WSO2
 
Springer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshotsSpringer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshots
Aliaksandr Birukou
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit Analytics
Andrew Clark
 
IA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature ReviewIA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature Review
Christian Esteve Rothenberg
 
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
Andy McNamara
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
swolny
 
CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 May
Martin Turner
 
Monitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalMonitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access Journal
Ina Smith
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET Journal
 
SCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdfSCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdf
SharmilaDevi90
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Enrico Motta
 
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
FajarMaulana962405
 

Similar to ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature (20)

Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
SciVerse @ TJU
SciVerse @ TJUSciVerse @ TJU
SciVerse @ TJU
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
Publishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it workPublishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it work
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
 
Streaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsStreaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and Semantics
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
 
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product Overview
 
Springer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshotsSpringer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshots
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit Analytics
 
IA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature ReviewIA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature Review
 
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 May
 
Monitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalMonitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access Journal
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
SCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdfSCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdf
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
 

More from Francesco Osborne

Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Francesco Osborne
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
Francesco Osborne
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
Francesco Osborne
 
Klink-2: integrating multiple web sources to generate semantic topic networks
 Klink-2: integrating multiple web sources to generate semantic topic networks Klink-2: integrating multiple web sources to generate semantic topic networks
Klink-2: integrating multiple web sources to generate semantic topic networks
Francesco Osborne
 
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
Francesco Osborne
 
Ekaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User FeedbackEkaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User Feedback
Francesco Osborne
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
Francesco Osborne
 

More from Francesco Osborne (7)

Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
 
Klink-2: integrating multiple web sources to generate semantic topic networks
 Klink-2: integrating multiple web sources to generate semantic topic networks Klink-2: integrating multiple web sources to generate semantic topic networks
Klink-2: integrating multiple web sources to generate semantic topic networks
 
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
 
Ekaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User FeedbackEkaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User Feedback
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
 

Recently uploaded

Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
suyashempire
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
RDhivya6
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
VetriVel359477
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 

Recently uploaded (20)

Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 

ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

  • 1. Improving Editorial Workflow and Metadata Quality at Springer Nature Angelo Salatino1, Francesco Osborne1, Aliaksandr Birukou2, Enrico Motta1 1 Knowledge Media Institute, The Open University, United Kingdom 2 Springer Nature, Heidelberg, Germany ISWC 2019
  • 2. Open University and Springer Nature Collaboration The Open University and Springer Nature have been collaborating since 2014 in the development of an array of semantically-enhanced solutions for: Osborne et al. (2017) Supporting Springer Nature Editors by means of Semantic Technologies. ISWC 2017. Vienna, Austria. • Semi-automatic classification of proceedings and other editorial products. • Automatic selection of the most appropriate books, journals, and proceedings to market at a scientific event. • Analysis of SN codes, with the aim of evolving marked codes and detecting fields that deserve further attention. • Joint release of the Computer Science Ontology.
  • 3. Generation of Metadata It is a crucial task to enable scholars, students, companies and other stakeholders to discover and access this knowledge. Traditionally, editors choose a list of related keywords and categories in relevant taxonomies according to: • their own experience of similar conferences; • a visual exploration of titles and abstracts; • a list of terms given by the curators or derived by calls for papers.
  • 4. Classification of Publications – A Complex Problem Classify publications manually presents a number of issues for a large editor such as Springer Nature. • It a complex process that require expert editors • It is time-consuming process which can hardly scale • It is easy to miss the emergence of new topics • It is easy to assume that some traditional topics are still popular when this is no longer the case • The keywords used in the call of papers are often a reflection of what a venue aspires to be, rather than the real contents of the proceedings.
  • 5. Smart Topic Miner 1.0 - 2016
  • 6. Smart Topic Miner 1.0 - 2016 Presented at ISWC 2016 Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. ISWC 2016
  • 7. A success story • Since 2016 STM had been regularly used by editors in Germany, China, Brazil, India, and Japan. • It is used to classify more than 800 conference proceedings volume per year including the Lecture Notes in Computer Science (LNCS) as well as LNBIP, CCIS, IFIP-AICT, LNICST. • It changed completely SN internal workflow: now the task is semi- automatic and monitored by junior editors. • It is constantly evolving and including new functionalities, following the feedback from the editorial team.
  • 8. Smart Topic Miner 1.0 - 2016
  • 9. Smart Topic Miner 1.2 - 2017
  • 10. Smart Topic Miner 2.0 - 2019
  • 11. Business Value • STM halves the time needed for classifying proceedings from 30 to 15 minutes. • It allows also junior editors to work on the classification of proceedings, distributing the load and reducing costs. • The adoption of a controlled vocabulary makes the process more robust and facilitates the identification of related editorial products. 11
  • 12. Retrievability About 9M of additional downloads thanks to STM. 0 5000 10000 15000 20000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Average number of yearly downloads for books in SpringerLink downloads (CS Proceedings) expected downloads (CS Proceedings) downloads (CS Proceedings) withSTM downloads (other books in CS) downloads (overall)
  • 13. Smart Topic Miner 2.0 - 2019 http://stm-demo.kmi.open.ac.uk Demo 462
  • 14. Smart Topic Miner 2.0 - 2019 • New GUI. • New Knowledge Base (CSO). • New Topic Detection Engine (CSO Classifier). • Ability to compare with previous editions. • Integrated with SN system and CSO Portal. http://stm-demo.kmi.open.ac.uk
  • 15. SN Editors HTML - GUI Parser Generate Visualizations STM Engine CSO SNCs Historical Data i) CSO Classifier ii) Topic Explanation iii) Taxonomy Generation iv) SN Tags Inference v) Previous Classification word2vec model STM 2.0 - architecture
  • 16. A new knowledge base - The Computer Science Ontology The Computer Science Ontology (CSO) is a large-scale, automatically generated ontology of research areas. It is the largest ontology in the field of Computer Science, including about 14K topics and 162K semantic relationships. Salatino et al (2019) The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas. Data Intelligence. http://cso.kmi.open.ac.uk/
  • 17. A new topic detection engine - The CSO Classifier The CSO Classifier is a unsupervised approach for automatically classifying documents according to CSO. Salatino et al. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles. https://cso.kmi.open.ac.uk/classify/ https://cso.kmi.open.ac.uk/classify/ https://github.com/angelosalatino/cso-classifier Download Demo pip install cso-classifier
  • 18. The CSO Classifier - Architecture
  • 19. Evaluation - Performance Classifier Description Prec. Rec. F1 TF-IDF TF-IDF 16.7% 24.0% 19.7% TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1% LDA100 LDA with 100 topics. 5.9% 11.9% 7.9% LDA500 LDA with 500 topics. 4.2% 12.5% 6.3% LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3% LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6% LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2% LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7% W2V-W W2V on windows of words. 41.2% 16.7% 23.8% STM - 2016 Classifier used by STM 1.0. 80.8% 58.2% 67.6% STM – 2017 (CSO-SYN) CSO Classifier -Syntactic module. 78.3% 63.8% 70.3% CSO-SEM CSO Classifier -Semantic module. 70.8% 72.2% 71.5% STM – 2019 (CSO-C) The CSO Classifier. 73.0% 75.3% 74.1% Computed on a GS of 70 publications, each annotated by 3 researchers.
  • 20. Evaluation - Usability System SUS score Grade Percentile STM 2016 76.6 B 80% STM 2019 82.8 A 93% 0 20 40 60 80 100 Editor 4 Editor 1 Editor 9 Editor 5 Editor 6 Editor 7 Editor 3 Editor 2 Editor 8 SUS Score 0 1 2 3 4 5 Editor 4 Editor 1 Editor 9 Editor 5 Editor 6 Editor 7 Editor 3 Editor 2 Editor 8 SUS Categories Want to use frequently Easy to use Easy to Learn Too complex
  • 21. Conclusion and Future Work • “A little semantic goes a long way” • Semantic explainability is crucial in this domain • We are working on an application that will support authors in annotating their own papers. • Typing of scientific entities: approaches, tasks, domains, resources. • Automatic extraction of Scientific Knowledge Graph.
  • 22. Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. In ISWC 2016 ). Available at http://rdcu.be/wEHY Email: francesco.osborne@open.ac.uk Twitter: FraOsborne Site: people.kmi.open.ac.uk/francesco See also

Editor's Notes

  1. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  2. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  3. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  4. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  5. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  6. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  7. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  8. In the scholarly domain, ontologies are often used to facilitate the integration of large datasets of research data, the exploration of the academic landscape, information extraction from scientific articles, and so on. On January 2019, KMi released, in conjunction with Springer Nature, the Computer Science Ontology (CSO), which is the largest taxonomy of research areas in the field. This resource was automatically generated by mining a dataset of 16M publications and using a combination of machine learning and semantic technologies to extract 14K research topics and 162K semantic relationships. CSO includes a much larger number of research topics than the alternatives (e.g., ACM Classification), enabling a very granular characterisation of the content of research papers, and it can be easily updated by running our ontology learning approach on recent corpora of publications. It attracted the attentional of several institutions and companies, such as Digital Science, Elsevier, and ACM, interested in adopting CSO for characterizing their datasets of research publications. We are currently developing a similar ontology in the field of Engineering and we plan of applying our technology on several other fields (Biomedical, Economics).
  9. Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. The CSO Classifier is an application for automatically classifying research papers according to CSO. We are currently using it to enrich the description of 150K publications on Springer Nature online library. We also started a collaboration with Digital Science, the creators of Dimensions, with the aim of automatically annotating their dataset of scholarly data. The resulting characterization of research papers can be used for supporting tasks such as identifying research communities, forecasting research trends, detecting relevant reviewers, and so on.