SlideShare a Scribd company logo

Automatic Classification of Springer Nature Proceedings with Smart Topic Miner

The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.

1 of 17
Download to read offline
Francesco Osborne1, Angelo Salatino1,
Aliaksandr Birukou2, Enrico Motta1
1 KMi, The Open University, United Kingdom
2 Springer Nature
ISWC 2016
Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner
Classifying Scholarly publications
It is a crucial task to enable scholars, students, companies and
other stakeholders to discover and access this knowledge.
2
• their own experience of
similar conferences;
• a visual exploration of titles
and abstracts;
• a list of terms given by the
curators or derived by calls for
papers.
Traditionally, editors choose a list of related keywords and
categories in relevant taxonomies according to:
Classifying Scholarly publications
Classify publication manually presents a number of issue for a
big editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale (1.5M
papers/year)
• It is easy to miss the emergence of a new topic
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.
3
44
Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data with Rexplore.
In International semantic web conference (pp. 460-477). (2013)
technologies.kmi.open.ac.uk/rexplore/
The Smart Topic Miner
The Smart Topic Miner (STM) is a semantic application designed
to support the Springer Nature Computer Science editorial
team in classifying scholarly publications.
5
http://rexplore.kmi.open.ac.uk/STM_demo
STM Architecture
6

Recommended

EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25Francesco Osborne
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...Francesco Osborne
 
Early Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsEarly Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsAngelo Salatino
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 

More Related Content

What's hot

Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsaneeshabakharia
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPHjtleek
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebConstantin Orasan
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataPolytechnic University of Bari
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Salam Shah
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 

What's hot (20)

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasets
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPH
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic Web
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Research Statement
Research StatementResearch Statement
Research Statement
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 

Similar to Automatic Classification of Springer Nature Proceedings with Smart Topic Miner

Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabusAkshatha Nair
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabusanoop bk
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerEric Stephan
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Studyswolny
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...Francesco Osborne
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examplestmra
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4CLARIAH
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 

Similar to Automatic Classification of Springer Nature Proceedings with Smart Topic Miner (20)

Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
JCDL 2013 DOCTORAL CONSORTIUM
JCDL 2013 DOCTORAL CONSORTIUMJCDL 2013 DOCTORAL CONSORTIUM
JCDL 2013 DOCTORAL CONSORTIUM
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabus
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
0 computers and social sciences pmy 2330 lectures notes 2017
0 computers and social sciences pmy 2330 lectures notes 20170 computers and social sciences pmy 2330 lectures notes 2017
0 computers and social sciences pmy 2330 lectures notes 2017
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
 
Integrating Semantic Systems
Integrating Semantic SystemsIntegrating Semantic Systems
Integrating Semantic Systems
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Be computer-engineering-2012
Be computer-engineering-2012Be computer-engineering-2012
Be computer-engineering-2012
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 

Recently uploaded

Volatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduatesVolatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduatesAhmed Metwaly
 
commercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its usescommercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its usesSilpa Selvaraj
 
Rootstock scion and Interstock Relationship Selection of Elite Mother Plants
Rootstock scion and Interstock Relationship Selection of Elite Mother PlantsRootstock scion and Interstock Relationship Selection of Elite Mother Plants
Rootstock scion and Interstock Relationship Selection of Elite Mother PlantsAmanDohre
 
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANKREJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANKAmanDohre
 
Elbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointElbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointTELISHA2
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsPeter Coles
 
2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure2024 Insilicogen Company English Brochure
2024 Insilicogen Company English BrochureInsilico Gen
 
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptxExploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptxSamrat Tayade
 
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilHydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilZeeshan Nazir
 
PROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typesPROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typeseshasmalik27
 
conceptofatomic#number Physical Science second sem week 1.pptx
conceptofatomic#number Physical Science second sem week 1.pptxconceptofatomic#number Physical Science second sem week 1.pptx
conceptofatomic#number Physical Science second sem week 1.pptxAnggeComeso
 
Duchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxDuchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxNavanidhan.M
 
Thornyissue testing of slideshow for website
Thornyissue testing of slideshow for websiteThornyissue testing of slideshow for website
Thornyissue testing of slideshow for websitesuelcarter1
 
The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...Advanced-Concepts-Team
 
Planeta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet NinePlaneta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet NineSérgio Sacani
 
REARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptxREARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptxVISHALI SELVAM
 
Open Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of AstrophysicsOpen Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of AstrophysicsPeter Coles
 
Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Ahmed Metwaly
 
green chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptgreen chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptRashmiSanghi1
 

Recently uploaded (20)

Volatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduatesVolatile Oils-Introduction for pharmacy students and graduates
Volatile Oils-Introduction for pharmacy students and graduates
 
Research methods in ethnobotany- Exploring Traditional Wisdom
Research methods in ethnobotany- Exploring Traditional WisdomResearch methods in ethnobotany- Exploring Traditional Wisdom
Research methods in ethnobotany- Exploring Traditional Wisdom
 
commercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its usescommercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its uses
 
Rootstock scion and Interstock Relationship Selection of Elite Mother Plants
Rootstock scion and Interstock Relationship Selection of Elite Mother PlantsRootstock scion and Interstock Relationship Selection of Elite Mother Plants
Rootstock scion and Interstock Relationship Selection of Elite Mother Plants
 
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANKREJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
REJUVENATION THROUGH PROGENY ORCHAD AND SCION BANK
 
Elbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointElbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow joint
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
 
2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure
 
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptxExploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
 
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilHydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
 
PROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typesPROSTHETIC FEET description and its types
PROSTHETIC FEET description and its types
 
conceptofatomic#number Physical Science second sem week 1.pptx
conceptofatomic#number Physical Science second sem week 1.pptxconceptofatomic#number Physical Science second sem week 1.pptx
conceptofatomic#number Physical Science second sem week 1.pptx
 
Duchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxDuchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptx
 
Thornyissue testing of slideshow for website
Thornyissue testing of slideshow for websiteThornyissue testing of slideshow for website
Thornyissue testing of slideshow for website
 
The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...
 
Planeta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet NinePlaneta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet Nine
 
REARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptxREARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptx
 
Open Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of AstrophysicsOpen Access Publishing and the Open Journal of Astrophysics
Open Access Publishing and the Open Journal of Astrophysics
 
Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)
 
green chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptgreen chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.ppt
 

Automatic Classification of Springer Nature Proceedings with Smart Topic Miner

  • 1. Francesco Osborne1, Angelo Salatino1, Aliaksandr Birukou2, Enrico Motta1 1 KMi, The Open University, United Kingdom 2 Springer Nature ISWC 2016 Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
  • 2. Classifying Scholarly publications It is a crucial task to enable scholars, students, companies and other stakeholders to discover and access this knowledge. 2 • their own experience of similar conferences; • a visual exploration of titles and abstracts; • a list of terms given by the curators or derived by calls for papers. Traditionally, editors choose a list of related keywords and categories in relevant taxonomies according to:
  • 3. Classifying Scholarly publications Classify publication manually presents a number of issue for a big editor such as Springer Nature. • It a complex process that require expert editors • It is time-consuming process which can hardly scale (1.5M papers/year) • It is easy to miss the emergence of a new topic • It is easy to assume that some traditional topics are still popular when this is no longer the case • The keywords used in the call of papers are often a reflection of what a venue aspires to be, rather than the real contents of the proceedings. 3
  • 4. 44 Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data with Rexplore. In International semantic web conference (pp. 460-477). (2013) technologies.kmi.open.ac.uk/rexplore/
  • 5. The Smart Topic Miner The Smart Topic Miner (STM) is a semantic application designed to support the Springer Nature Computer Science editorial team in classifying scholarly publications. 5 http://rexplore.kmi.open.ac.uk/STM_demo
  • 7. Background Data - The Computer Science Ontology 1 • Not fine-grained enough. – E.g., only 2 topics are classified under Semantic Web • Static, manually defined, hence prone to get obsolete very quickly. 7 Standard research areas taxonomies/classifications/ontologies such as ACM are not apt to the task. ACM 2012
  • 8. The Computer Science Ontology was automatically created and updated by applying the Klink-2 algorithm. Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In ISWC 2015. (2015) Background Data - The Computer Science Ontology 2
  • 9. • We automatically generated a large-scale ontology consist of about 15,000 topics linked by about 70,000 semantic relationships. • It included very granular and low level research areas, e.g., Linked open data, Probabilistic packet marking, Synthetic aperture radar imaging • It can be regularly updated by running Klink-2 on a new set of publications. • It allows for a research topic to have multiple super-areas – i.e., the taxonomic structure is a graph rather than a tree, e.g., Inductive Logic Programming is a sub-area of both Machine Learning and Logic Programming. 9 Background Data - The Computer Science Ontology 3
  • 10. The initial keywords are enriched with terms extracted from the publications and then mapped to a list of research areas in the CSO ontology; Initial Keywords (from authors and editors) (1) Computer Science [21] --- (2) Internet [18] -------- (3) World wide web [16] ------------- (4) Semantic web [16] ------------------ (5) Rdf [7] ------------------ (5) Linked data [5] ---------- (3) NLP systems [3] --------------- (4) Question answering [2] ---------- (3) Recommender systems [2] --- (2) Artificial intelligence [12] -------- (3) Knowledge based systems [8] ------------- (4) Knowledge representation [4] ------------------ (5) Description logic [3] -------- (3) Machine learning [4] (1) Semantics [24] --- (2) Ontology [10] --- (2) Metadata [7] -------- (3) Rdf [7] --- (2) Semantic web [16] (1) Language [5] --- (2) Vocabulary [2] […] semantic:24, rdf:7, applications:5, semantic web:5, knowledge base:4, linked data:4, ontology:4, ontologies:4, language:3, knowledge bases:3, algorithms:2, integration:2, architecture:2, semantics:2, knowledge management:2, query answering:2, recommendation:2, question answering system:2, semantic similarity:2, question answering:2, vocabulary:2, svm:1, graph traversal:1, information needs:1, path ranking:1, baidu encyclopedia:1, non- aggregation questions:1, support vector machine:1, implicit information:1, construction:1, knowledge base completion:1, relational constraints:1, semantical regularizations:1, support vector machine (svm):1, machine learning:1, support vector:1, facts:1, logic programming:1, multi-strategy learning:1, distant supervision:1, competitor mining:1, lossy compression:1, comprehensive evaluation:1, relation reasoning:1, websites:1, competition:1, decision support:1, learning algorithm:1 […] linked data:3, relational constraints:1, semantical regularizations:1, question answering:1, graph traversal:1, non- aggregation questions:1, implicit information:1, knowledge base completion:1, dbpedia:1, recommender system:1, relation extraction:1, weakly supervised:1, baidu encyclopedia:1, svm:1, path ranking:1, medical events:1, competitor mining:1, description logics:1, multi-strategy learning:1, distant supervision:1, relation reasoning:1, non-standard reasoning services:1, concept similarity measures:1, semantic data:1, medical guidelines:1, rdf:1, prolog:1, preference profile:1, similarity measure:1, ontology development:1, knowledge representation:1, graph simplification:1, rdf visualization:1, triple ranking:1, sparql-rank:1, rank-join operator:1, “shaowei” (稍微 ‘a little’):1, minimal degree adverb:1, a little:1, rdf native storage:1, news analysis:1, meta-data extraction:1, database integration:1, elderly nursing care:1 […] Enriched Keywords (extracted from abstract, titles, etc) CSO Ontology topics STM Approach – 1 Topic extraction
  • 11. A greedy set-covering algorithm is used to reduce the topics to a user- friendly number. • We run the algorithm separately on the set of topics at each level of the ontology, to preserve both high level and granular research areas. • The standard version of the greedy set-covering algorithm did not work well in this domain: multiple high level topics cover a similar set of papers. • It assigns an initial weight to each paper and at each iteration it selects the topic which covered the publications with the highest weight and reduces the weight of every covered paper. 11 STM Approach – 2 Topic Selection
  • 12. The selected topics are used to infer a number of SNC tags, using the mapping between CSO ontology and SNC. I00001 : computer science, general I23001 : computer applications I23050 : computational biology/bioinformatics I13006 : computer systems organization an communication networks I13014 : processor architectures I13022 : computer comm. networks I21009 : computing methodologies I21017 : artificial intelligence I1200X : computer hardware I12050 : logic design I14002 : software engineering/programming and operating systems I22005 : computer imaging, vision, pattern recognition and graphics I22021 : image processing I18008 : information sys. and comm. servic I18030 : data mining, knowledge discove (1) Computer Science [69] (2) Bioinformatics [69] (2) Artificial intelligence [16] (3) Machine learning [9] (4) Support vector machines [7] (2) Computer architecture [13] (3) Program processors [13] (4) Graphics Processing Unit (GPU) [7] (5) Cuda [3] (2) Image processing [12] (3) Image reconstruction [6] (2) Data mining [9] […] (3) Telecommunication networks [5] STM Approach – 3 Tag Selection
  • 13. User Trial 1 We conducted individual sessions with 8 experienced SN editors. We introduced STM for about 15 minutes and then asked them to classify a number of proceedings in their fields of expertize for about 45 minutes. The expertise of the editors included: Theoretical Computer Science, Computer Networks, Software Engineering, HCI, AI, Bioinformatics, and Security. After the hands-on session the editors filled a three-parts survey: • Background and expertise • Five questions about the strengths and weaknesses of STM and three about the quality of the results • SUS questionnaire 13
  • 14. User Trial 2 Background and expertise • On average 13 years of experience (7 out of 8 having at least 5 years) • All of them stated to have extensive knowledge of the main topic classifications in their fields • Four of them considered themselves also experts at working with digital proceedings. Open questions about STM strengths and weaknesses • STM had a positive effect on their work. • They estimated the accuracy of the results between 75% and 90%. • Limitation: the scope limited to the Computer Science field and occasional noisy results when examining books with very few chapters. • Suggested features: produce analytics about the evolution of a venue or a journal in terms; allowing users to find the most significant proceedings for a topic. 14
  • 15. User Trial 3 Quality of results and usability SUS: 77/100, 80% percentile rank 15
  • 16. Conclusions Key Lessons • Allow users to know the rationale behind a suggestion. • Value of Semantic Technologies for helping users in addressing noisy data. Future work • Discussing a project to further integrate STM into Springer Nature workflows. • Extending STM to characterize the evolution of conferences and venues in time. – e.g. highlighting new emerging topics, as well as the fact that some traditional topics are fading out • Using STM for directly supporting authors in defining the set of topics which best describe their paper. 16
  • 17. Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. In International Semantic Web Conference (pp. 383-399). Springer International Publishing. (2016) Email: francesco.osborne@open.ac.uk Twitter: FraOsborne Site: people.kmi.open.ac.uk/francesco