Identifying the research topics that best describe the scope of a scientific publication is a crucial task for editors, in particular because the quality of these annotations determine how effectively users are able to discover the right content in online libraries. For this reason, Springer Nature, the world's largest academic book publisher, has traditionally entrusted this task to their most expert editors. These editors manually analyse all new books, possibly including hundreds of chapters, and produce a list of the most relevant topics. Hence, this process has traditionally been very expensive, time-consuming, and confined to a few senior editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology-driven application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science. Since then STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, for a total of about 800 volumes per year. Over the past three years the initial prototype has iteratively evolved in response to feedback from the users and evolving requirements. In this paper we present the most recent version of the tool and describe the evolution of the system over the years, the key lessons learnt, and the impact on the Springer Nature workflow. In particular, our solution has drastically reduced the time needed to annotate proceedings and significantly improved their discoverability, resulting in 9.3 million additional downloads. We also present a user study involving 9 editors, which yielded excellent results in term of usability, and report an evaluation of the new topic classifier used by STM, which outperforms previous versions in recall and F-measure.
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The document summarizes research on automatically classifying Springer Nature proceedings using the Smart Topic Miner (STM). STM extracts topics from publications, maps them to a computer science ontology, selects relevant topics using a greedy algorithm, and infers tags. It was tested on 8 Springer Nature editors who found STM accurately classified 75-90% of proceedings and improved their work. However, STM is currently limited to computer science and occasional noisy results were found in books with few chapters. Future work aims to expand STM to characterize topic evolution over time and directly support author tagging.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
Despite many efforts for making data about scholarly publications available on the Web of Data, lots of information about academic conferences is still contained in (at best) free-text format. When available in a structured format, these data would provide an essential input for the decisions researchers, libraries, publishers, funding and evaluation bodies take every day.
This talk will describe the project about having such data available as Linked Open Data (LOD) at lod.springer.com for around 10,000 computer science conferences. In addition, we will have a closer look at the lessons learnt from launching this portal and cover other Linked Data projects in Springer Nature. Finally, a novel semi-automated approach for classifying conference proceedings in Springer Nature will also be presented.
This presentation was provided by Paul Needham of Cranfield University and Johan Bollen of Indiana University, during the NISO webinar "Measuring Use, Assessing Success, Part Two: Count Me In: Measuring Individual Item Usage," which was held on September 15, 2010.
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Conference Identity: persistent identifiers for conferencesAliaksandr Birukou
Conferences are an essential part of scholarly communication. However, like researchers and organizations they suffer from the disambiguation problem, when the same acronym or the conference name refers to very different conferences. In 2017, Crossref and DataCite started a working group on conference and project identifiers. The group includes various publishers, A&I service providers, and other interested stakeholders. The group participants have drafted the metadata specification and gathered the feedback from the community.
In this talk, we would like to update the VIVO participants with where we stand with the PIDs for conferences, conference series and Crossmark for proceedings and are inviting the broader community to comment.
Read the CrossRef post for more info about the group:
https://www.crossref.org/working-groups/conferences-projects/
Authors: Aliaksandr Birukou and Patricia Feeney
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...Gabriel Moreira
Presentation of the Phd. thesis defense of Gabriel de Souza Pereira Moreira at Instituto Tecnológico de Aeronáutica (ITA), on Dec. 09, 2019, in São José dos Campos, Brazil.
Abstract:
Recommender systems have been increasingly popular in assisting users with their choices, thus enhancing their engagement and overall satisfaction with online services. Since the last decade, recommender systems became a topic of increasing interest among machine learning, human-computer interaction, and information retrieval researchers.
News recommender systems are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. Therefore, it is a challenging scenario for recommendations. Large publishers release hundreds of news daily, implying that they must deal with fast-growing numbers of items that get quickly outdated and irrelevant to most readers. News readers exhibit more unstable consumption behavior than users in other domains such as entertainment. External events, like breaking news, affect readers interests. In addition, the news domain experiences extreme levels of sparsity, as most users are anonymous, with no past behavior tracked.
Since 2016, Deep Learning methods and techniques have been explored in Recommender Systems research. In general, they can be divided into methods for: Deep Collaborative Filtering, Learning Item Embeddings, Session-based Recommendations using Recurrent Neural Networks (RNN), and Feature Extraction from Items' Unstructured Data such as text, images, audio, and video.
The main contribution of this research was named CHAMELEON a meta-architecture designed to tackle the specific challenges of news recommendation. It consists of a modular reference architecture which can be instantiated using different neural building blocks.
As information about users' past interactions is scarce in the news domain, information such as the user context (e.g., time, location, device, the sequence of clicks within the session), static and dynamic article features like the article textual content and its popularity and recency, are explicitly modeled in a hybrid session-based recommendation approach using RNNs.
The recommendation task addressed in this work is the next-item prediction for user sessions, i.e., "what is the next most likely article a user might read in a session?". A temporal offline evaluation is used for a realistic offline evaluation of such task, considering factors that affect global readership interests like popularity, recency, and seasonality.
Experiments performed with two large datasets have shown the effectiveness of the CHAMELEON for news recommendation on many quality factors such as accuracy, item coverage, novelty, and reduced item cold-start problem, when compared to other traditional and state-of-the-art session-based algorithms.
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The document summarizes research on automatically classifying Springer Nature proceedings using the Smart Topic Miner (STM). STM extracts topics from publications, maps them to a computer science ontology, selects relevant topics using a greedy algorithm, and infers tags. It was tested on 8 Springer Nature editors who found STM accurately classified 75-90% of proceedings and improved their work. However, STM is currently limited to computer science and occasional noisy results were found in books with few chapters. Future work aims to expand STM to characterize topic evolution over time and directly support author tagging.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
Despite many efforts for making data about scholarly publications available on the Web of Data, lots of information about academic conferences is still contained in (at best) free-text format. When available in a structured format, these data would provide an essential input for the decisions researchers, libraries, publishers, funding and evaluation bodies take every day.
This talk will describe the project about having such data available as Linked Open Data (LOD) at lod.springer.com for around 10,000 computer science conferences. In addition, we will have a closer look at the lessons learnt from launching this portal and cover other Linked Data projects in Springer Nature. Finally, a novel semi-automated approach for classifying conference proceedings in Springer Nature will also be presented.
This presentation was provided by Paul Needham of Cranfield University and Johan Bollen of Indiana University, during the NISO webinar "Measuring Use, Assessing Success, Part Two: Count Me In: Measuring Individual Item Usage," which was held on September 15, 2010.
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Conference Identity: persistent identifiers for conferencesAliaksandr Birukou
Conferences are an essential part of scholarly communication. However, like researchers and organizations they suffer from the disambiguation problem, when the same acronym or the conference name refers to very different conferences. In 2017, Crossref and DataCite started a working group on conference and project identifiers. The group includes various publishers, A&I service providers, and other interested stakeholders. The group participants have drafted the metadata specification and gathered the feedback from the community.
In this talk, we would like to update the VIVO participants with where we stand with the PIDs for conferences, conference series and Crossmark for proceedings and are inviting the broader community to comment.
Read the CrossRef post for more info about the group:
https://www.crossref.org/working-groups/conferences-projects/
Authors: Aliaksandr Birukou and Patricia Feeney
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...Gabriel Moreira
Presentation of the Phd. thesis defense of Gabriel de Souza Pereira Moreira at Instituto Tecnológico de Aeronáutica (ITA), on Dec. 09, 2019, in São José dos Campos, Brazil.
Abstract:
Recommender systems have been increasingly popular in assisting users with their choices, thus enhancing their engagement and overall satisfaction with online services. Since the last decade, recommender systems became a topic of increasing interest among machine learning, human-computer interaction, and information retrieval researchers.
News recommender systems are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. Therefore, it is a challenging scenario for recommendations. Large publishers release hundreds of news daily, implying that they must deal with fast-growing numbers of items that get quickly outdated and irrelevant to most readers. News readers exhibit more unstable consumption behavior than users in other domains such as entertainment. External events, like breaking news, affect readers interests. In addition, the news domain experiences extreme levels of sparsity, as most users are anonymous, with no past behavior tracked.
Since 2016, Deep Learning methods and techniques have been explored in Recommender Systems research. In general, they can be divided into methods for: Deep Collaborative Filtering, Learning Item Embeddings, Session-based Recommendations using Recurrent Neural Networks (RNN), and Feature Extraction from Items' Unstructured Data such as text, images, audio, and video.
The main contribution of this research was named CHAMELEON a meta-architecture designed to tackle the specific challenges of news recommendation. It consists of a modular reference architecture which can be instantiated using different neural building blocks.
As information about users' past interactions is scarce in the news domain, information such as the user context (e.g., time, location, device, the sequence of clicks within the session), static and dynamic article features like the article textual content and its popularity and recency, are explicitly modeled in a hybrid session-based recommendation approach using RNNs.
The recommendation task addressed in this work is the next-item prediction for user sessions, i.e., "what is the next most likely article a user might read in a session?". A temporal offline evaluation is used for a realistic offline evaluation of such task, considering factors that affect global readership interests like popularity, recency, and seasonality.
Experiments performed with two large datasets have shown the effectiveness of the CHAMELEON for news recommendation on many quality factors such as accuracy, item coverage, novelty, and reduced item cold-start problem, when compared to other traditional and state-of-the-art session-based algorithms.
The document describes a prototype that retrieves related scientific publications from different linked datasets through thesaurus alignment. It introduces several linked datasets, including Agrovoc, OpenAgris, STW and EconStor. The prototype matches concepts from a user query to concepts in the linked datasets' thesauri to identify related publications. Pseudocode is provided to illustrate the process of concept mapping and querying multiple datasets. The goal is to retrieve relevant publications from different sources through a single interface.
The document introduces SciVerse, a platform from Elsevier that integrates SciVerse Scopus, SciVerse ScienceDirect, and SciVerse Hub. It provides an overview of each component, including Scopus's abstract and citation database, ScienceDirect's full-text articles and books, and the Hub which provides a single search across both. It outlines new features like the author evaluator tool, enhanced citation tracker, and APIs available to developers. The goal is to empower research through integrated content and applications that improve discovery, productivity, and collaboration across ScienceDirect, Scopus, and other sources.
The document discusses how data-centric science is driving the need for new tools and technologies to support large-scale data sharing and collaboration. It provides examples of projects like the Sloan Digital Sky Survey that have pioneered new models for open data publishing and public engagement with science. Microsoft research is working on technologies to support the entire scientific research lifecycle from data acquisition and modeling to analysis, visualization, and open dissemination of research outputs.
Publishing conference proceedings internationally: how does it workAliaksandr Birukou
In this presentation we look into main elements one has to consider when organizing an international conference. First, we describe the role of conference proceedings in CS and beyond. Second, we focus on the tasks of conference organizers. Third, we cover the peer review aspects and announce the new group CrossRef and DataCite start with this respect. We then cover indexing and dissemination as well as present several tips and guidelines for organizers of international conferences as well as the word of warning regarding predatory publishers.
В этой презентации мы рассмотрим основные элементы, которые необходимо учитывать при организации международной конференции. Во-первых, мы описываем роль материалов конференций в компьютерных науках и других областях. Во-вторых, мы концентрируемся на задачах организаторов конференции. В-третьих, мы рассмотрим аспекты рецензирования и расскажем о работе группы CrossRef и DataCite. Затем мы расскажем об индексировании и распространении, а также представим несколько советов и рекомендаций для организаторов международных конференций, а также предостережём о феномене хищнических издателей и конференций.
This presentation, hold during Semantcs conference, introduce Ontos' current achievement towards a Streaming-based Text Mining solution by using Deep Learning and Semantic Web technologies.
Since the most of the world’s data is unstructured, the mining of required information from text was, is and will be essential.
Martin Voigt | CEO Ontos GmbH
Presentation at Semantics 2016 in Leipzig in the context with the results of the LEDS project
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semanticssemanticsconference
This document discusses Ontos' approach to streaming-based text mining using deep learning and semantics. It describes use cases for content augmentation and monitoring, and requirements around entity detection, multiple languages/sources, and domain adaptation. An overview of the text analytics market shows growth areas. Ontos' WildhornMiner uses deep learning models trained on large corpora to classify entities in supervised models. Lessons learned include the benefits of neural networks and Kafka for streaming. Next steps involve relation extraction, search interfaces, and benchmarking.
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...CaaS EU FP7 Project
Context-oriented Knowledge Management in Production Networks
By Kurt Sandkuhl
Invited lecture on October 8 at the GSOM Emerging Markets conference in St. Petersburg
WSO2 Data Analytics Server is a comprehensive enterprise data analytics platform; it fuses batch and real-time analytics of any source of data with predictive analytics via machine learning.
Springer LOD conference portal. Demo paper - screenshotsAliaksandr Birukou
This is a slide deck with main features I have used as a backup for the demo at The 16th International Semantic Web Conference – ISWC2017 in Vienna next week. Many thanks to Volha Bryl and Andrey Gromyko from Net Wise for helping me to prepare the demo, as well as Alfred Hofmann (Lecture Notes in Computer Science (LNCS) ) and Henning Schoenenberger (Knowledge Graph (SN SciGraph) ) for continuous support. Of course, this is also based on the earlier work of Markus Kaindl and Kai Eckert from Stuttgart Media University.
If you want to read the original paper - here it is: http://birukou.eu/publications/papers/201710Birukou-ISWC2017-springer-lod.pdf
ITAC 2016 Where Open Source Meets Audit AnalyticsAndrew Clark
Open source software is taking the computer science community and IT departments by storm. The breadth of options, the timeliness of updates, the price, and the sense of community are all contributing factors to the rise of open source computing. For many years audit analytics has been confined to the Computer Assisted Auditing Techniques, CAAT, software vendors ACL, IDEA and now Arbutus. However, these software programs require extensive training to use effectively, are not very flexible, and in most cases fail to provide the outcome auditors are expecting. Moving to an open source platform based around the python ecosystem allows for true customization of analytics, and provides a common language to interact with your IT department. By using the same set of tools, an auditing department can move from rudimentary AP duplicate tests all the way to advanced classification and clustering machine learning tests. Although the barrier to entry for open source software is higher than for most CAATs, with cross-functional collaboration, a truly customized, sustainable, and highly effective analytics program can be created.
This introductory lecture for IA377 will be devoted to the topic of “Literature Review”.
What is a literature review?
Methodology, best practices, tips, tools, etc.
Practical example
Application to IA377 seminar activities.
https://ia377-feec-unicamp.github.io/classes/2023/03/09/Literature-Review.html
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...Andy McNamara
In this thesis the hypothesis “Life cycle analysis can be further utilised and integrated into the BIM process through the use of flexible API scripting and graphical programming” will be investigated and demonstrated through the use of an experimental case study.
The CREW system provides a virtual research environment to support collaborative research events. It allows users to record events, replay and annotate recordings, and conduct faceted searches across recorded content and related resources. The system was developed through a user-centered design process involving three user groups to ensure it meets researchers' needs. Future work will focus on ongoing user requirements gathering, supported evaluation events, and further development based on user feedback.
Monitoring & evaluating the usage of your Open Access JournalIna Smith
This document discusses various tools for monitoring and evaluating the usage of open access journals, including general tools like Google Analytics, platform-specific tools from SciELO SA, and journal-specific tools used by the South African Journal of Science. It provides information on tracking metrics like downloads, citations, social media engagement, and collaboration with authors and institutions to increase a journal's reach and measure its impact.
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
The document proposes a system that uses deep learning methods for automated document summarization and classification. It uses a recurrent convolutional neural network (RCNN) which combines a convolutional neural network and recurrent neural network to build a robust classifier model. For summarization, it employs a graph-based method inspired by PageRank to extract the top 20% of sentences from a document based on word intersections. The RCNN model achieved over 97% accuracy on classifying documents from various domains using their summaries. The system aims to speed up classification and make it more intuitive using automated summarization techniques with deep learning.
This document summarizes research on automatically classifying and extracting non-functional requirements (NFRs) from text files using supervised machine learning. The researchers created a dataset of NFR keywords by analyzing NFR catalogs identified through a systematic mapping study. The keywords were categorized into security, performance, and usability. They then tested a supervised learning approach on a existing dataset containing 625 software specifications. The approach achieved accuracy rates between 85-98% for classifying NFRs into the security, performance, and usability categories. The research thus provides a way to generate labeled datasets for training machine learning models to automatically classify NFRs mentioned in text documents.
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
The document discusses research directions in intelligent systems and data science. It describes work on making sense of scholarly data through techniques like data mining, semantic technologies, and machine learning. It also discusses mapping and classifying computer science research areas using an automatically generated ontology with over 14,000 topics. Other topics discussed include predicting emerging research areas, applications in smart cities like the MK:Smart project, and potential roles for robots in smart cities like an autonomous health and safety inspector.
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...FajarMaulana962405
This book provides a review of recent research on applied optimization and swarm intelligence. It covers topics such as ensemble methods and their applications to optimization problems, swarm intelligence algorithms for numerical association rule mining and time series forecasting, soccer-inspired metaheuristics, cognitive modeling of swarm intelligence for decision making, nature-inspired algorithms for mobile robot path planning and control, hardware architectures for swarm robotics, and multi-objective optimization frameworks. The book serves as a reference for researchers interested in swarm intelligence, optimization, machine learning, and industrial applications.
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Francesco Osborne
Technologies such as algorithms, applications and formats are an
important part of the knowledge produced and reused in the
research process. Typically, a technology is expected to originate
in the context of a research area and then spread and contribute to several other fields. For example, Semantic Web technologies
have been successfully adopted by a variety of fields, e.g.,
Information Retrieval, Human Computer Interaction, Biology, and
many others. Unfortunately, the spreading of technologies across
research areas may be a slow and inefficient process, since it is
easy for researchers to be unaware of potentially relevant
solutions produced by other research communities. In this paper,
we hypothesise that it is possible to learn typical technology
propagation patterns from historical data and to exploit this
knowledge i) to anticipate where a technology may be adopted
next and ii) to alert relevant stakeholders about emerging and
relevant technologies in other fields. To do so, we propose the
Technology-Topic Framework, a novel approach which uses a
semantically enhanced technology-topic model to forecast the
propagation of technologies to research areas. A formal evaluation
of the approach on a set of technologies in the Semantic Web and
Artificial Intelligence areas has produced excellent results,
confirming the validity of our solution.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
TechMiner is a new approach that combines natural language processing, machine learning, and semantic technologies to extract information about technologies (such as applications, systems, languages, and formats) from research publications. It generates an ontology describing technologies and their relationships to other research entities. The approach was evaluated on a gold standard of manually annotated publications and found to improve precision and recall over alternative natural language processing approaches. Future work includes enriching the approach to identify additional scientific objects and applying it to other research fields.
More Related Content
Similar to ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature
The document describes a prototype that retrieves related scientific publications from different linked datasets through thesaurus alignment. It introduces several linked datasets, including Agrovoc, OpenAgris, STW and EconStor. The prototype matches concepts from a user query to concepts in the linked datasets' thesauri to identify related publications. Pseudocode is provided to illustrate the process of concept mapping and querying multiple datasets. The goal is to retrieve relevant publications from different sources through a single interface.
The document introduces SciVerse, a platform from Elsevier that integrates SciVerse Scopus, SciVerse ScienceDirect, and SciVerse Hub. It provides an overview of each component, including Scopus's abstract and citation database, ScienceDirect's full-text articles and books, and the Hub which provides a single search across both. It outlines new features like the author evaluator tool, enhanced citation tracker, and APIs available to developers. The goal is to empower research through integrated content and applications that improve discovery, productivity, and collaboration across ScienceDirect, Scopus, and other sources.
The document discusses how data-centric science is driving the need for new tools and technologies to support large-scale data sharing and collaboration. It provides examples of projects like the Sloan Digital Sky Survey that have pioneered new models for open data publishing and public engagement with science. Microsoft research is working on technologies to support the entire scientific research lifecycle from data acquisition and modeling to analysis, visualization, and open dissemination of research outputs.
Publishing conference proceedings internationally: how does it workAliaksandr Birukou
In this presentation we look into main elements one has to consider when organizing an international conference. First, we describe the role of conference proceedings in CS and beyond. Second, we focus on the tasks of conference organizers. Third, we cover the peer review aspects and announce the new group CrossRef and DataCite start with this respect. We then cover indexing and dissemination as well as present several tips and guidelines for organizers of international conferences as well as the word of warning regarding predatory publishers.
В этой презентации мы рассмотрим основные элементы, которые необходимо учитывать при организации международной конференции. Во-первых, мы описываем роль материалов конференций в компьютерных науках и других областях. Во-вторых, мы концентрируемся на задачах организаторов конференции. В-третьих, мы рассмотрим аспекты рецензирования и расскажем о работе группы CrossRef и DataCite. Затем мы расскажем об индексировании и распространении, а также представим несколько советов и рекомендаций для организаторов международных конференций, а также предостережём о феномене хищнических издателей и конференций.
This presentation, hold during Semantcs conference, introduce Ontos' current achievement towards a Streaming-based Text Mining solution by using Deep Learning and Semantic Web technologies.
Since the most of the world’s data is unstructured, the mining of required information from text was, is and will be essential.
Martin Voigt | CEO Ontos GmbH
Presentation at Semantics 2016 in Leipzig in the context with the results of the LEDS project
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semanticssemanticsconference
This document discusses Ontos' approach to streaming-based text mining using deep learning and semantics. It describes use cases for content augmentation and monitoring, and requirements around entity detection, multiple languages/sources, and domain adaptation. An overview of the text analytics market shows growth areas. Ontos' WildhornMiner uses deep learning models trained on large corpora to classify entities in supervised models. Lessons learned include the benefits of neural networks and Kafka for streaming. Next steps involve relation extraction, search interfaces, and benchmarking.
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...CaaS EU FP7 Project
Context-oriented Knowledge Management in Production Networks
By Kurt Sandkuhl
Invited lecture on October 8 at the GSOM Emerging Markets conference in St. Petersburg
WSO2 Data Analytics Server is a comprehensive enterprise data analytics platform; it fuses batch and real-time analytics of any source of data with predictive analytics via machine learning.
Springer LOD conference portal. Demo paper - screenshotsAliaksandr Birukou
This is a slide deck with main features I have used as a backup for the demo at The 16th International Semantic Web Conference – ISWC2017 in Vienna next week. Many thanks to Volha Bryl and Andrey Gromyko from Net Wise for helping me to prepare the demo, as well as Alfred Hofmann (Lecture Notes in Computer Science (LNCS) ) and Henning Schoenenberger (Knowledge Graph (SN SciGraph) ) for continuous support. Of course, this is also based on the earlier work of Markus Kaindl and Kai Eckert from Stuttgart Media University.
If you want to read the original paper - here it is: http://birukou.eu/publications/papers/201710Birukou-ISWC2017-springer-lod.pdf
ITAC 2016 Where Open Source Meets Audit AnalyticsAndrew Clark
Open source software is taking the computer science community and IT departments by storm. The breadth of options, the timeliness of updates, the price, and the sense of community are all contributing factors to the rise of open source computing. For many years audit analytics has been confined to the Computer Assisted Auditing Techniques, CAAT, software vendors ACL, IDEA and now Arbutus. However, these software programs require extensive training to use effectively, are not very flexible, and in most cases fail to provide the outcome auditors are expecting. Moving to an open source platform based around the python ecosystem allows for true customization of analytics, and provides a common language to interact with your IT department. By using the same set of tools, an auditing department can move from rudimentary AP duplicate tests all the way to advanced classification and clustering machine learning tests. Although the barrier to entry for open source software is higher than for most CAATs, with cross-functional collaboration, a truly customized, sustainable, and highly effective analytics program can be created.
This introductory lecture for IA377 will be devoted to the topic of “Literature Review”.
What is a literature review?
Methodology, best practices, tips, tools, etc.
Practical example
Application to IA377 seminar activities.
https://ia377-feec-unicamp.github.io/classes/2023/03/09/Literature-Review.html
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...Andy McNamara
In this thesis the hypothesis “Life cycle analysis can be further utilised and integrated into the BIM process through the use of flexible API scripting and graphical programming” will be investigated and demonstrated through the use of an experimental case study.
The CREW system provides a virtual research environment to support collaborative research events. It allows users to record events, replay and annotate recordings, and conduct faceted searches across recorded content and related resources. The system was developed through a user-centered design process involving three user groups to ensure it meets researchers' needs. Future work will focus on ongoing user requirements gathering, supported evaluation events, and further development based on user feedback.
Monitoring & evaluating the usage of your Open Access JournalIna Smith
This document discusses various tools for monitoring and evaluating the usage of open access journals, including general tools like Google Analytics, platform-specific tools from SciELO SA, and journal-specific tools used by the South African Journal of Science. It provides information on tracking metrics like downloads, citations, social media engagement, and collaboration with authors and institutions to increase a journal's reach and measure its impact.
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
The document proposes a system that uses deep learning methods for automated document summarization and classification. It uses a recurrent convolutional neural network (RCNN) which combines a convolutional neural network and recurrent neural network to build a robust classifier model. For summarization, it employs a graph-based method inspired by PageRank to extract the top 20% of sentences from a document based on word intersections. The RCNN model achieved over 97% accuracy on classifying documents from various domains using their summaries. The system aims to speed up classification and make it more intuitive using automated summarization techniques with deep learning.
This document summarizes research on automatically classifying and extracting non-functional requirements (NFRs) from text files using supervised machine learning. The researchers created a dataset of NFR keywords by analyzing NFR catalogs identified through a systematic mapping study. The keywords were categorized into security, performance, and usability. They then tested a supervised learning approach on a existing dataset containing 625 software specifications. The approach achieved accuracy rates between 85-98% for classifying NFRs into the security, performance, and usability categories. The research thus provides a way to generate labeled datasets for training machine learning models to automatically classify NFRs mentioned in text documents.
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
The document discusses research directions in intelligent systems and data science. It describes work on making sense of scholarly data through techniques like data mining, semantic technologies, and machine learning. It also discusses mapping and classifying computer science research areas using an automatically generated ontology with over 14,000 topics. Other topics discussed include predicting emerging research areas, applications in smart cities like the MK:Smart project, and potential roles for robots in smart cities like an autonomous health and safety inspector.
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...FajarMaulana962405
This book provides a review of recent research on applied optimization and swarm intelligence. It covers topics such as ensemble methods and their applications to optimization problems, swarm intelligence algorithms for numerical association rule mining and time series forecasting, soccer-inspired metaheuristics, cognitive modeling of swarm intelligence for decision making, nature-inspired algorithms for mobile robot path planning and control, hardware architectures for swarm robotics, and multi-objective optimization frameworks. The book serves as a reference for researchers interested in swarm intelligence, optimization, machine learning, and industrial applications.
Similar to ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature (20)
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Francesco Osborne
Technologies such as algorithms, applications and formats are an
important part of the knowledge produced and reused in the
research process. Typically, a technology is expected to originate
in the context of a research area and then spread and contribute to several other fields. For example, Semantic Web technologies
have been successfully adopted by a variety of fields, e.g.,
Information Retrieval, Human Computer Interaction, Biology, and
many others. Unfortunately, the spreading of technologies across
research areas may be a slow and inefficient process, since it is
easy for researchers to be unaware of potentially relevant
solutions produced by other research communities. In this paper,
we hypothesise that it is possible to learn typical technology
propagation patterns from historical data and to exploit this
knowledge i) to anticipate where a technology may be adopted
next and ii) to alert relevant stakeholders about emerging and
relevant technologies in other fields. To do so, we propose the
Technology-Topic Framework, a novel approach which uses a
semantically enhanced technology-topic model to forecast the
propagation of technologies to research areas. A formal evaluation
of the approach on a set of technologies in the Semantic Web and
Artificial Intelligence areas has produced excellent results,
confirming the validity of our solution.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
TechMiner is a new approach that combines natural language processing, machine learning, and semantic technologies to extract information about technologies (such as applications, systems, languages, and formats) from research publications. It generates an ontology describing technologies and their relationships to other research entities. The approach was evaluated on a gold standard of manually annotated publications and found to improve precision and recall over alternative natural language processing approaches. Future work includes enriching the approach to identify additional scientific objects and applying it to other research fields.
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
Klink-2: integrating multiple web sources to generate semantic topic networksFrancesco Osborne
ISWC 2015 research paper: http://oro.open.ac.uk/43793/1/ISWC2015_CR.pdf
Abstract:
The amount of scholarly data available on the web is steadily increasing, enabling different types of analytics which can provide important insights into the research activity. In order to make sense of and explore this large-scale body of knowledge we need an accurate, comprehensive and up-to-date ontology of research topics. Unfortunately, human crafted classifications do not satisfy these criteria, as they evolve too slowly and tend to be too coarse-grained. Current automated methods for generating ontologies of research areas also present a number of limitations, such as: i) they do not consider the rich amount of indirect statistical and semantic relationships, which can help to understand the relation between two topics – e.g., the fact that two research areas are associated with a similar set of venues or technologies; ii) they do not distinguish between different kinds of hierarchical relationships; and iii) they are not able to handle effectively ambiguous topics characterized by a noisy set of relationships. In this paper we present Klink-2, a novel approach which improves on our earlier work on automatic generation of semantic topic networks and addresses the aforementioned limitations by taking advantage of a variety of knowledge sources available on the web. In particular, Klink-2 analyses networks of research entities (including papers, authors, venues, and technologies) to infer three kinds of semantic relationships between topics. It also identifies ambiguous keywords (e.g., “ontology”) and separates them into the appropriate distinct topics – e.g., “ontology/philosophy” vs. “ontology/semantic web”. Our experimental evaluation shows that the ability of Klink-2 to integrate a high number of data sources and to generate topics with accurate contextual meaning yields significant improvements over other algorithms in terms of both precision and recall.
EKAW2014 - A Hybrid Semantic Approach to Building Dynamic Maps of Research C...Francesco Osborne
A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities
by F. Osborne, G. Scavo, E. Motta
URL: http://oro.open.ac.uk/41083/
In earlier papers we characterised the notion of diachronic topic-based communities –i.e., communities of people who work on semantically related topics at the same time. These communities are important to enable topic-centred analyses of the dynamics of the research world. In this paper we present an innovative algorithm, called Research Communities Map Builder (RCMB), which is able to automatically link diachronic topic-based communities over subsequent time intervals to identify significant events. These include topic shifts within a research community; the appearance and fading of a community; communities splitting, merging, spawning other communities; and others. The output of our algorithm is a map of research communities, annotated with the detected events, which provides a concise visual representation of the dynamics of a research area. In contrast with existing approaches, RCMB enables a much more fine-grained understanding of the evolution of research communities, with respect to both the granularity of the events and the granularity of the topics. This improved understanding can, for example, inform the research strategies of funders and researchers alike. We illustrate our approach with two case studies, highlighting the main communities and events that characterized the World Wide Web and Semantic Web areas in the 2000 – 2010 decade.
Ekaw2014 - Inferring Semantic Relations by User FeedbackFrancesco Osborne
Inferring Semantic Relations by User Feedback
by F. Osborne, E. Motta
URL: http://oro.open.ac.uk/41162/
In the last ten years, ontology-based recommender systems have been shown to be effective tools for predicting user preferences and suggesting items. There are however some issues associated with the ontologies adopted by these approaches, such as: 1) their crafting is not a cheap process, being time consuming and calling for specialist expertise; 2) they may not represent accurately the viewpoint of the targeted user community; 3) they tend to provide rather static models, which fail to keep track of evolving user perspectives. To address these issues, we propose Klink UM, an approach for extracting emergent semantics from user feedbacks, with the aim of tailoring the ontology to the users and improving the recommendations accuracy. Klink UM uses statistical and machine learning techniques for finding hierarchical and similarity relationships between keywords associated with rated items and can be used for: 1) building a conceptual taxonomy from scratch, 2) enriching and correcting an existing ontology, 3) providing a numerical estimate of the intensity of semantic relationships according to the users. The evaluation shows that Klink UM performs well with respect to handcrafted ontologies and can significantly increase the accuracy of suggestions in content-based recommender systems.
This document presents a method for clustering citation distributions of authors to categorize them semantically and predict future citations. It uses hierarchical clustering with normalized Euclidean distance on citation distributions. Clusters are evaluated based on homogeneity of citation patterns over time. Semantic features of author bibliometric data are represented using the BiDO ontology to link numeric and categorical data over time. The method was evaluated on a dataset of 20,000 computer scientists from 1990-2010. Future work involves augmenting features, applying it to groups, extending the ontology, and creating a linked bibliometric triplestore.
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Sérgio Sacani
Magmatic iron-meteorite parent bodies are the earliest planetesimals in the Solar System,and they preserve information about conditions and planet-forming processes in thesolar nebula. In this study, we include comprehensive elemental compositions andfractional-crystallization modeling for iron meteorites from the cores of five differenti-ated asteroids from the inner Solar System. Together with previous results of metalliccores from the outer Solar System, we conclude that asteroidal cores from the outerSolar System have smaller sizes, elevated siderophile-element abundances, and simplercrystallization processes than those from the inner Solar System. These differences arerelated to the formation locations of the parent asteroids because the solar protoplane-tary disk varied in redox conditions, elemental distributions, and dynamics at differentheliocentric distances. Using highly siderophile-element data from iron meteorites, wereconstruct the distribution of calcium-aluminum-rich inclusions (CAIs) across theprotoplanetary disk within the first million years of Solar-System history. CAIs, the firstsolids to condense in the Solar System, formed close to the Sun. They were, however,concentrated within the outer disk and depleted within the inner disk. Future modelsof the structure and evolution of the protoplanetary disk should account for this dis-tribution pattern of CAIs.
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxgoluk9330
Ahota Beel, nestled in Sootea Biswanath Assam , is celebrated for its extraordinary diversity of bird species. This wetland sanctuary supports a myriad of avian residents and migrants alike. Visitors can admire the elegant flights of migratory species such as the Northern Pintail and Eurasian Wigeon, alongside resident birds including the Asian Openbill and Pheasant-tailed Jacana. With its tranquil scenery and varied habitats, Ahota Beel offers a perfect haven for birdwatchers to appreciate and study the vibrant birdlife that thrives in this natural refuge.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
PPT on Sustainable Land Management presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxshubhijain836
Centrifugation is a powerful technique used in laboratories to separate components of a heterogeneous mixture based on their density. This process utilizes centrifugal force to rapidly spin samples, causing denser particles to migrate outward more quickly than lighter ones. As a result, distinct layers form within the sample tube, allowing for easy isolation and purification of target substances.
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature
1. Improving Editorial Workflow and Metadata
Quality at Springer Nature
Angelo Salatino1, Francesco Osborne1,
Aliaksandr Birukou2, Enrico Motta1
1
Knowledge Media Institute, The Open University, United Kingdom
2
Springer Nature, Heidelberg, Germany
ISWC 2019
2. Open University and Springer Nature Collaboration
The Open University and Springer Nature have been collaborating since 2014 in
the development of an array of semantically-enhanced solutions for:
Osborne et al. (2017) Supporting Springer Nature Editors by means of Semantic Technologies. ISWC 2017. Vienna, Austria.
• Semi-automatic classification of proceedings
and other editorial products.
• Automatic selection of the most appropriate
books, journals, and proceedings to market at a
scientific event.
• Analysis of SN codes, with the aim of evolving
marked codes and detecting fields that deserve
further attention.
• Joint release of the Computer Science Ontology.
3. Generation of Metadata
It is a crucial task to enable scholars, students, companies and other stakeholders to
discover and access this knowledge.
Traditionally, editors choose a list of related
keywords and categories in relevant taxonomies
according to:
• their own experience of similar conferences;
• a visual exploration of titles and abstracts;
• a list of terms given by the curators or derived
by calls for papers.
4. Classification of Publications – A Complex Problem
Classify publications manually presents a number of issues for
a large editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale
• It is easy to miss the emergence of new topics
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.
6. Smart Topic Miner 1.0 - 2016
Presented at ISWC 2016
Osborne, F., Salatino, A., Birukou, A. and Motta,
E.: Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner. ISWC 2016
7. A success story
• Since 2016 STM had been regularly used by editors in Germany,
China, Brazil, India, and Japan.
• It is used to classify more than 800 conference proceedings
volume per year including the Lecture Notes in Computer Science
(LNCS) as well as LNBIP, CCIS, IFIP-AICT, LNICST.
• It changed completely SN internal workflow: now the task is semi-
automatic and monitored by junior editors.
• It is constantly evolving and including new functionalities,
following the feedback from the editorial team.
11. Business Value
• STM halves the time needed for classifying proceedings from
30 to 15 minutes.
• It allows also junior editors to work on the classification of
proceedings, distributing the load and reducing costs.
• The adoption of a controlled vocabulary makes the process
more robust and facilitates the identification of related
editorial products.
11
12. Retrievability
About 9M of additional downloads thanks to STM.
0
5000
10000
15000
20000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Average number of yearly downloads
for books in SpringerLink
downloads (CS Proceedings) expected downloads (CS Proceedings)
downloads (CS Proceedings) withSTM downloads (other books in CS)
downloads (overall)
14. Smart Topic Miner 2.0 - 2019
• New GUI.
• New Knowledge Base (CSO).
• New Topic Detection Engine
(CSO Classifier).
• Ability to compare with
previous editions.
• Integrated with SN system
and CSO Portal.
http://stm-demo.kmi.open.ac.uk
15. SN Editors
HTML - GUI
Parser
Generate
Visualizations
STM Engine
CSO
SNCs
Historical
Data
i) CSO Classifier
ii) Topic Explanation
iii) Taxonomy Generation
iv) SN Tags Inference
v) Previous Classification
word2vec model
STM 2.0 - architecture
16. A new knowledge base - The Computer Science
Ontology
The Computer Science Ontology (CSO) is a large-scale, automatically generated
ontology of research areas. It is the largest ontology in the field of Computer Science,
including about 14K topics and 162K semantic relationships.
Salatino et al (2019) The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas. Data Intelligence.
http://cso.kmi.open.ac.uk/
17. A new topic detection engine - The CSO Classifier
The CSO Classifier is a unsupervised approach for automatically classifying documents
according to CSO.
Salatino et al. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles.
https://cso.kmi.open.ac.uk/classify/
https://cso.kmi.open.ac.uk/classify/
https://github.com/angelosalatino/cso-classifier
Download
Demo
pip install cso-classifier
19. Evaluation - Performance
Classifier Description Prec. Rec. F1
TF-IDF TF-IDF 16.7% 24.0% 19.7%
TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1%
LDA100 LDA with 100 topics. 5.9% 11.9% 7.9%
LDA500 LDA with 500 topics. 4.2% 12.5% 6.3%
LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3%
LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6%
LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2%
LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7%
W2V-W W2V on windows of words. 41.2% 16.7% 23.8%
STM - 2016 Classifier used by STM 1.0. 80.8% 58.2% 67.6%
STM – 2017 (CSO-SYN) CSO Classifier -Syntactic module. 78.3% 63.8% 70.3%
CSO-SEM CSO Classifier -Semantic module. 70.8% 72.2% 71.5%
STM – 2019 (CSO-C) The CSO Classifier. 73.0% 75.3% 74.1%
Computed on a GS of 70 publications, each annotated by 3 researchers.
20. Evaluation - Usability
System SUS score Grade Percentile
STM 2016 76.6 B 80%
STM 2019 82.8 A 93%
0 20 40 60 80 100
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Score
0 1 2 3 4 5
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Categories
Want to use frequently Easy to use
Easy to Learn Too complex
21. Conclusion and Future Work
• “A little semantic goes a long way”
• Semantic explainability is crucial in this domain
• We are working on an application that will support authors in
annotating their own papers.
• Typing of scientific entities: approaches, tasks, domains,
resources.
• Automatic extraction of Scientific Knowledge Graph.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic.
STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
In the scholarly domain, ontologies are often used to facilitate the integration of large datasets of research data, the exploration of the academic landscape, information extraction from scientific articles, and so on.
On January 2019, KMi released, in conjunction with Springer Nature, the Computer Science Ontology (CSO), which is the largest taxonomy of research areas in the field. This resource was automatically generated by mining a dataset of 16M publications and using a combination of machine learning and semantic technologies to extract 14K research topics and 162K semantic relationships. CSO includes a much larger number of research topics than the alternatives (e.g., ACM Classification), enabling a very granular characterisation of the content of research papers, and it can be easily updated by running our ontology learning approach on recent corpora of publications. It attracted the attentional of several institutions and companies, such as Digital Science, Elsevier, and ACM, interested in adopting CSO for characterizing their datasets of research publications.
We are currently developing a similar ontology in the field of Engineering and we plan of applying our technology on several other fields (Biomedical, Economics).
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment.
The CSO Classifier is an application for automatically classifying research papers according to CSO. We are currently using it to enrich the description of 150K publications on Springer Nature online library. We also started a collaboration with Digital Science, the creators of Dimensions, with the aim of automatically annotating their dataset of scholarly data.
The resulting characterization of research papers can be used for supporting tasks such as identifying research communities, forecasting research trends, detecting relevant reviewers, and so on.