SlideShare a Scribd company logo
1 of 17
Francesco Osborne1, Angelo Salatino1,
Aliaksandr Birukou2, Enrico Motta1
1 KMi, The Open University, United Kingdom
2 Springer Nature
ISWC 2016
Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner
Classifying Scholarly publications
It is a crucial task to enable scholars, students, companies and
other stakeholders to discover and access this knowledge.
2
• their own experience of
similar conferences;
• a visual exploration of titles
and abstracts;
• a list of terms given by the
curators or derived by calls for
papers.
Traditionally, editors choose a list of related keywords and
categories in relevant taxonomies according to:
Classifying Scholarly publications
Classify publication manually presents a number of issue for a
big editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale (1.5M
papers/year)
• It is easy to miss the emergence of a new topic
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.
3
44
Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data with Rexplore.
In International semantic web conference (pp. 460-477). (2013)
technologies.kmi.open.ac.uk/rexplore/
The Smart Topic Miner
The Smart Topic Miner (STM) is a semantic application designed
to support the Springer Nature Computer Science editorial
team in classifying scholarly publications.
5
http://rexplore.kmi.open.ac.uk/STM_demo
STM Architecture
6
Background Data - The Computer Science Ontology 1
• Not fine-grained enough.
– E.g., only 2 topics are classified under Semantic Web
• Static, manually defined, hence prone to get obsolete very
quickly.
7
Standard research areas taxonomies/classifications/ontologies
such as ACM are not apt to the task.
ACM 2012
The Computer Science Ontology was automatically created and
updated by applying the Klink-2 algorithm.
Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate
semantic topic networks. In ISWC 2015. (2015)
Background Data - The Computer Science Ontology 2
• We automatically generated a large-scale ontology consist of about
15,000 topics linked by about 70,000 semantic relationships.
• It included very granular and low level research areas, e.g., Linked
open data, Probabilistic packet marking, Synthetic aperture radar
imaging
• It can be regularly updated by running Klink-2 on a new set of
publications.
• It allows for a research topic to have multiple super-areas – i.e., the
taxonomic structure is a graph rather than a tree, e.g., Inductive Logic
Programming is a sub-area of both Machine Learning and Logic
Programming.
9
Background Data - The Computer Science Ontology 3
The initial keywords are enriched with terms extracted from the
publications and then mapped to a list of research areas in the CSO
ontology;
Initial Keywords
(from authors and editors)
(1) Computer Science [21]
--- (2) Internet [18]
-------- (3) World wide web [16]
------------- (4) Semantic web [16]
------------------ (5) Rdf [7]
------------------ (5) Linked data [5]
---------- (3) NLP systems [3]
--------------- (4) Question answering [2]
---------- (3) Recommender systems [2]
--- (2) Artificial intelligence [12]
-------- (3) Knowledge based systems [8]
------------- (4) Knowledge representation [4]
------------------ (5) Description logic [3]
-------- (3) Machine learning [4]
(1) Semantics [24]
--- (2) Ontology [10]
--- (2) Metadata [7]
-------- (3) Rdf [7]
--- (2) Semantic web [16]
(1) Language [5]
--- (2) Vocabulary [2] […]
semantic:24, rdf:7, applications:5, semantic
web:5, knowledge base:4, linked data:4,
ontology:4, ontologies:4, language:3,
knowledge bases:3, algorithms:2,
integration:2, architecture:2, semantics:2,
knowledge management:2, query
answering:2, recommendation:2, question
answering system:2, semantic similarity:2,
question answering:2, vocabulary:2, svm:1,
graph traversal:1, information needs:1, path
ranking:1, baidu encyclopedia:1, non-
aggregation questions:1, support vector
machine:1, implicit information:1,
construction:1, knowledge base
completion:1, relational constraints:1,
semantical regularizations:1, support vector
machine (svm):1, machine learning:1,
support vector:1, facts:1, logic
programming:1, multi-strategy learning:1,
distant supervision:1, competitor mining:1,
lossy compression:1, comprehensive
evaluation:1, relation reasoning:1,
websites:1, competition:1, decision
support:1, learning algorithm:1 […]
linked data:3, relational constraints:1,
semantical regularizations:1, question
answering:1, graph traversal:1, non-
aggregation questions:1, implicit
information:1, knowledge base
completion:1, dbpedia:1, recommender
system:1, relation extraction:1, weakly
supervised:1, baidu encyclopedia:1, svm:1,
path ranking:1, medical events:1, competitor
mining:1, description logics:1, multi-strategy
learning:1, distant supervision:1, relation
reasoning:1, non-standard reasoning
services:1, concept similarity measures:1,
semantic data:1, medical guidelines:1, rdf:1,
prolog:1, preference profile:1, similarity
measure:1, ontology development:1,
knowledge representation:1, graph
simplification:1, rdf visualization:1, triple
ranking:1, sparql-rank:1, rank-join
operator:1, “shaowei” (稍微 ‘a little’):1,
minimal degree adverb:1, a little:1, rdf native
storage:1, news analysis:1, meta-data
extraction:1, database integration:1, elderly
nursing care:1 […]
Enriched Keywords
(extracted from abstract, titles, etc)
CSO Ontology topics
STM Approach – 1 Topic extraction
A greedy set-covering algorithm is used to reduce the topics to a user-
friendly number.
• We run the algorithm separately on the set of topics at each level of
the ontology, to preserve both high level and granular research areas.
• The standard version of the greedy set-covering algorithm did not
work well in this domain: multiple high level topics cover a similar set
of papers.
• It assigns an initial weight to each paper and at each iteration it selects
the topic which covered the publications with the highest weight and
reduces the weight of every covered paper.
11
STM Approach – 2 Topic Selection
The selected topics are used to infer a number of SNC tags, using the
mapping between CSO ontology and SNC.
I00001 : computer science, general
I23001 : computer applications
I23050 : computational
biology/bioinformatics
I13006 : computer systems organization an
communication networks
I13014 : processor architectures
I13022 : computer comm. networks
I21009 : computing methodologies
I21017 : artificial intelligence
I1200X : computer hardware
I12050 : logic design
I14002 : software engineering/programming
and operating systems
I22005 : computer imaging, vision, pattern
recognition and graphics
I22021 : image processing
I18008 : information sys. and comm. servic
I18030 : data mining, knowledge discove
(1) Computer Science [69]
(2) Bioinformatics [69]
(2) Artificial intelligence [16]
(3) Machine learning [9]
(4) Support vector machines [7]
(2) Computer architecture [13]
(3) Program processors [13]
(4) Graphics Processing Unit (GPU) [7]
(5) Cuda [3]
(2) Image processing [12]
(3) Image reconstruction [6]
(2) Data mining [9]
[…]
(3) Telecommunication networks [5]
STM Approach – 3 Tag Selection
User Trial 1
We conducted individual sessions with 8 experienced SN editors.
We introduced STM for about 15 minutes and then asked them to
classify a number of proceedings in their fields of expertize for about 45
minutes.
The expertise of the editors included: Theoretical Computer Science,
Computer Networks, Software Engineering, HCI, AI, Bioinformatics, and
Security.
After the hands-on session the editors filled a three-parts survey:
• Background and expertise
• Five questions about the strengths and weaknesses of STM and three
about the quality of the results
• SUS questionnaire
13
User Trial 2
Background and expertise
• On average 13 years of experience (7 out of 8 having at least 5 years)
• All of them stated to have extensive knowledge of the main topic
classifications in their fields
• Four of them considered themselves also experts at working with digital
proceedings.
Open questions about STM strengths and weaknesses
• STM had a positive effect on their work.
• They estimated the accuracy of the results between 75% and 90%.
• Limitation: the scope limited to the Computer Science field and occasional
noisy results when examining books with very few chapters.
• Suggested features: produce analytics about the evolution of a venue or a
journal in terms; allowing users to find the most significant proceedings for a
topic.
14
User Trial 3
Quality of results and usability
SUS: 77/100, 80% percentile rank
15
Conclusions
Key Lessons
• Allow users to know the rationale behind a suggestion.
• Value of Semantic Technologies for helping users in addressing noisy data.
Future work
• Discussing a project to further integrate STM into Springer Nature
workflows.
• Extending STM to characterize the evolution of conferences and
venues in time.
– e.g. highlighting new emerging topics, as well as the fact that some traditional
topics are fading out
• Using STM for directly supporting authors in defining the set of
topics which best describe their paper.
16
Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta
Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic
Classification of Springer Nature Proceedings with Smart Topic
Miner. In International Semantic Web Conference (pp. 383-399).
Springer International Publishing. (2016)
Email: francesco.osborne@open.ac.uk
Twitter: FraOsborne
Site: people.kmi.open.ac.uk/francesco

More Related Content

What's hot

Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsaneeshabakharia
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPHjtleek
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebConstantin Orasan
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataPolytechnic University of Bari
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Salam Shah
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 

What's hot (20)

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasets
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPH
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic Web
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Research Statement
Research StatementResearch Statement
Research Statement
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 

Similar to Classifying Scholarly Publications with Smart Topic Miner

Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabusAkshatha Nair
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabusanoop bk
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerEric Stephan
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Studyswolny
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...Francesco Osborne
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfGeethaPratyusha
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchJoshuaApolonio1
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examplestmra
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 

Similar to Classifying Scholarly Publications with Smart Topic Miner (20)

Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
JCDL 2013 DOCTORAL CONSORTIUM
JCDL 2013 DOCTORAL CONSORTIUMJCDL 2013 DOCTORAL CONSORTIUM
JCDL 2013 DOCTORAL CONSORTIUM
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabus
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
0 computers and social sciences pmy 2330 lectures notes 2017
0 computers and social sciences pmy 2330 lectures notes 20170 computers and social sciences pmy 2330 lectures notes 2017
0 computers and social sciences pmy 2330 lectures notes 2017
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
Integrating Semantic Systems
Integrating Semantic SystemsIntegrating Semantic Systems
Integrating Semantic Systems
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Be computer-engineering-2012
Be computer-engineering-2012Be computer-engineering-2012
Be computer-engineering-2012
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 

Recently uploaded

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 

Recently uploaded (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 

Classifying Scholarly Publications with Smart Topic Miner

  • 1. Francesco Osborne1, Angelo Salatino1, Aliaksandr Birukou2, Enrico Motta1 1 KMi, The Open University, United Kingdom 2 Springer Nature ISWC 2016 Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
  • 2. Classifying Scholarly publications It is a crucial task to enable scholars, students, companies and other stakeholders to discover and access this knowledge. 2 • their own experience of similar conferences; • a visual exploration of titles and abstracts; • a list of terms given by the curators or derived by calls for papers. Traditionally, editors choose a list of related keywords and categories in relevant taxonomies according to:
  • 3. Classifying Scholarly publications Classify publication manually presents a number of issue for a big editor such as Springer Nature. • It a complex process that require expert editors • It is time-consuming process which can hardly scale (1.5M papers/year) • It is easy to miss the emergence of a new topic • It is easy to assume that some traditional topics are still popular when this is no longer the case • The keywords used in the call of papers are often a reflection of what a venue aspires to be, rather than the real contents of the proceedings. 3
  • 4. 44 Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data with Rexplore. In International semantic web conference (pp. 460-477). (2013) technologies.kmi.open.ac.uk/rexplore/
  • 5. The Smart Topic Miner The Smart Topic Miner (STM) is a semantic application designed to support the Springer Nature Computer Science editorial team in classifying scholarly publications. 5 http://rexplore.kmi.open.ac.uk/STM_demo
  • 7. Background Data - The Computer Science Ontology 1 • Not fine-grained enough. – E.g., only 2 topics are classified under Semantic Web • Static, manually defined, hence prone to get obsolete very quickly. 7 Standard research areas taxonomies/classifications/ontologies such as ACM are not apt to the task. ACM 2012
  • 8. The Computer Science Ontology was automatically created and updated by applying the Klink-2 algorithm. Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In ISWC 2015. (2015) Background Data - The Computer Science Ontology 2
  • 9. • We automatically generated a large-scale ontology consist of about 15,000 topics linked by about 70,000 semantic relationships. • It included very granular and low level research areas, e.g., Linked open data, Probabilistic packet marking, Synthetic aperture radar imaging • It can be regularly updated by running Klink-2 on a new set of publications. • It allows for a research topic to have multiple super-areas – i.e., the taxonomic structure is a graph rather than a tree, e.g., Inductive Logic Programming is a sub-area of both Machine Learning and Logic Programming. 9 Background Data - The Computer Science Ontology 3
  • 10. The initial keywords are enriched with terms extracted from the publications and then mapped to a list of research areas in the CSO ontology; Initial Keywords (from authors and editors) (1) Computer Science [21] --- (2) Internet [18] -------- (3) World wide web [16] ------------- (4) Semantic web [16] ------------------ (5) Rdf [7] ------------------ (5) Linked data [5] ---------- (3) NLP systems [3] --------------- (4) Question answering [2] ---------- (3) Recommender systems [2] --- (2) Artificial intelligence [12] -------- (3) Knowledge based systems [8] ------------- (4) Knowledge representation [4] ------------------ (5) Description logic [3] -------- (3) Machine learning [4] (1) Semantics [24] --- (2) Ontology [10] --- (2) Metadata [7] -------- (3) Rdf [7] --- (2) Semantic web [16] (1) Language [5] --- (2) Vocabulary [2] […] semantic:24, rdf:7, applications:5, semantic web:5, knowledge base:4, linked data:4, ontology:4, ontologies:4, language:3, knowledge bases:3, algorithms:2, integration:2, architecture:2, semantics:2, knowledge management:2, query answering:2, recommendation:2, question answering system:2, semantic similarity:2, question answering:2, vocabulary:2, svm:1, graph traversal:1, information needs:1, path ranking:1, baidu encyclopedia:1, non- aggregation questions:1, support vector machine:1, implicit information:1, construction:1, knowledge base completion:1, relational constraints:1, semantical regularizations:1, support vector machine (svm):1, machine learning:1, support vector:1, facts:1, logic programming:1, multi-strategy learning:1, distant supervision:1, competitor mining:1, lossy compression:1, comprehensive evaluation:1, relation reasoning:1, websites:1, competition:1, decision support:1, learning algorithm:1 […] linked data:3, relational constraints:1, semantical regularizations:1, question answering:1, graph traversal:1, non- aggregation questions:1, implicit information:1, knowledge base completion:1, dbpedia:1, recommender system:1, relation extraction:1, weakly supervised:1, baidu encyclopedia:1, svm:1, path ranking:1, medical events:1, competitor mining:1, description logics:1, multi-strategy learning:1, distant supervision:1, relation reasoning:1, non-standard reasoning services:1, concept similarity measures:1, semantic data:1, medical guidelines:1, rdf:1, prolog:1, preference profile:1, similarity measure:1, ontology development:1, knowledge representation:1, graph simplification:1, rdf visualization:1, triple ranking:1, sparql-rank:1, rank-join operator:1, “shaowei” (稍微 ‘a little’):1, minimal degree adverb:1, a little:1, rdf native storage:1, news analysis:1, meta-data extraction:1, database integration:1, elderly nursing care:1 […] Enriched Keywords (extracted from abstract, titles, etc) CSO Ontology topics STM Approach – 1 Topic extraction
  • 11. A greedy set-covering algorithm is used to reduce the topics to a user- friendly number. • We run the algorithm separately on the set of topics at each level of the ontology, to preserve both high level and granular research areas. • The standard version of the greedy set-covering algorithm did not work well in this domain: multiple high level topics cover a similar set of papers. • It assigns an initial weight to each paper and at each iteration it selects the topic which covered the publications with the highest weight and reduces the weight of every covered paper. 11 STM Approach – 2 Topic Selection
  • 12. The selected topics are used to infer a number of SNC tags, using the mapping between CSO ontology and SNC. I00001 : computer science, general I23001 : computer applications I23050 : computational biology/bioinformatics I13006 : computer systems organization an communication networks I13014 : processor architectures I13022 : computer comm. networks I21009 : computing methodologies I21017 : artificial intelligence I1200X : computer hardware I12050 : logic design I14002 : software engineering/programming and operating systems I22005 : computer imaging, vision, pattern recognition and graphics I22021 : image processing I18008 : information sys. and comm. servic I18030 : data mining, knowledge discove (1) Computer Science [69] (2) Bioinformatics [69] (2) Artificial intelligence [16] (3) Machine learning [9] (4) Support vector machines [7] (2) Computer architecture [13] (3) Program processors [13] (4) Graphics Processing Unit (GPU) [7] (5) Cuda [3] (2) Image processing [12] (3) Image reconstruction [6] (2) Data mining [9] […] (3) Telecommunication networks [5] STM Approach – 3 Tag Selection
  • 13. User Trial 1 We conducted individual sessions with 8 experienced SN editors. We introduced STM for about 15 minutes and then asked them to classify a number of proceedings in their fields of expertize for about 45 minutes. The expertise of the editors included: Theoretical Computer Science, Computer Networks, Software Engineering, HCI, AI, Bioinformatics, and Security. After the hands-on session the editors filled a three-parts survey: • Background and expertise • Five questions about the strengths and weaknesses of STM and three about the quality of the results • SUS questionnaire 13
  • 14. User Trial 2 Background and expertise • On average 13 years of experience (7 out of 8 having at least 5 years) • All of them stated to have extensive knowledge of the main topic classifications in their fields • Four of them considered themselves also experts at working with digital proceedings. Open questions about STM strengths and weaknesses • STM had a positive effect on their work. • They estimated the accuracy of the results between 75% and 90%. • Limitation: the scope limited to the Computer Science field and occasional noisy results when examining books with very few chapters. • Suggested features: produce analytics about the evolution of a venue or a journal in terms; allowing users to find the most significant proceedings for a topic. 14
  • 15. User Trial 3 Quality of results and usability SUS: 77/100, 80% percentile rank 15
  • 16. Conclusions Key Lessons • Allow users to know the rationale behind a suggestion. • Value of Semantic Technologies for helping users in addressing noisy data. Future work • Discussing a project to further integrate STM into Springer Nature workflows. • Extending STM to characterize the evolution of conferences and venues in time. – e.g. highlighting new emerging topics, as well as the fact that some traditional topics are fading out • Using STM for directly supporting authors in defining the set of topics which best describe their paper. 16
  • 17. Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. In International Semantic Web Conference (pp. 383-399). Springer International Publishing. (2016) Email: francesco.osborne@open.ac.uk Twitter: FraOsborne Site: people.kmi.open.ac.uk/francesco