This document discusses early detection and forecasting of research trends. It outlines the problem, who cares about detecting new trends, and the state of the art in trend detection and impact forecasting which have limitations in detecting trends early. The planned approach is to use a wider range of data sources like scholarly data and social media to create a model that analyzes patterns in relationships between entities like authors, topics, and communities. An initial study analyzed the evolution of sub-networks between debutant and established topics, finding more activity between co-occurring keywords for debutant topics. Next steps include analyzing dynamics in other networks and integrating social media data.
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
Being aware of new research topics is an important asset for anybody involved in the research environment, including researchers, academic publishers and institutional funding bodies. In recent years, the amount of scholarly data available on the web has increased steadily, allowing the development of several approaches for detecting emerging research topics and assessing their trends. However, current methods focus on the detection of topics which are already associated with a label or a substantial number of documents. In this paper, we address instead the issue of detecting embryonic topics, which do not possess these characteristics yet. We suggest that it is possible to forecast the emergence of novel research topics even at such early stage and demonstrate that the emergence of a new topic can be anticipated by analysing the dynamics of pre-existing topics. We present an approach to evaluate such dynamics and an experiment on a sample of 3 million research papers, which confirms our hypothesis. In particular, we found that the pace of collaboration in sub-graphs of topics that will give rise to novel topics is significantly higher than the one in the control group.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
Clustering Citation Distributions for Semantic Categorization and Citation Prediction
by F. Osborne, S. Peroni, E. Motta
In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. This method allows the formulation of queries which take in consideration the citation behaviour of an author and predicts with a good level of accuracy future citation behaviours. We evaluate our approach with respect to alternative solutions and discuss the predicting abilities of the identified clusters.
URL: http://oro.open.ac.uk/40784/1/lisc2014.pdf
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.
We made a system to predict which scientific topics will become important in the future. To predict the future of science, we have used Machine Learning algorithms to learn how science behaved in the past and to use the resulting model to predict future trends in science.
#scichallenge2017
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
Being aware of new research topics is an important asset for anybody involved in the research environment, including researchers, academic publishers and institutional funding bodies. In recent years, the amount of scholarly data available on the web has increased steadily, allowing the development of several approaches for detecting emerging research topics and assessing their trends. However, current methods focus on the detection of topics which are already associated with a label or a substantial number of documents. In this paper, we address instead the issue of detecting embryonic topics, which do not possess these characteristics yet. We suggest that it is possible to forecast the emergence of novel research topics even at such early stage and demonstrate that the emergence of a new topic can be anticipated by analysing the dynamics of pre-existing topics. We present an approach to evaluate such dynamics and an experiment on a sample of 3 million research papers, which confirms our hypothesis. In particular, we found that the pace of collaboration in sub-graphs of topics that will give rise to novel topics is significantly higher than the one in the control group.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
Clustering Citation Distributions for Semantic Categorization and Citation Prediction
by F. Osborne, S. Peroni, E. Motta
In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. This method allows the formulation of queries which take in consideration the citation behaviour of an author and predicts with a good level of accuracy future citation behaviours. We evaluate our approach with respect to alternative solutions and discuss the predicting abilities of the identified clusters.
URL: http://oro.open.ac.uk/40784/1/lisc2014.pdf
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.
We made a system to predict which scientific topics will become important in the future. To predict the future of science, we have used Machine Learning algorithms to learn how science behaved in the past and to use the resulting model to predict future trends in science.
#scichallenge2017
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
To ensure that publications are assigned to clusters in a meaningful way, we introduce the notion of stable clusters. Essentially, a cluster is stable if it is insensitive to small changes in the underlying data. Bootstrapping is used to make small changes in the data.
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...Seoul National University
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
January 27 – February 1, 2019, Honolulu, Hawaii, USA.
https://aaai.org/Conferences/AAAI-19/
Navigation through citation network based on content similarity using cosine ...Salam Shah
The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.
Relations play a vital role on knowledge construction
and maintenance thereof. They for example connect domain
type entities to range type entities, like the relation born in
connects some Persons to some Places. Over any dataset, the
domain-range information is used to maintain data consistency.
Therefore, we see that knowledge construction frameworks
sometime engage costly Knowledge Engineers to define the
domain-range information in form of a schema or an ontology.
We also see that frameworks that hold such defined domain-range information, often do not follow them strictly. In the worst case some frameworks do not even allow to define a
domain-range, rather they just gather the knowledge entries.
One reason of not defining the domain-range information is
that it is costly. On the other hand, the reason for not following
the domain-range constraint is that the most of them are either
manual or semi-automatic, therefore they face adaptation
difficulty. In this research, we propose a relation-wise machine
learning model that can define and validate domain-range
information automatically. The initial experiment shows that
the proposed framework performs promisingly.
Presentation made on December 7th 2016 during ICADL'16
Full text can be found at http://link.springer.com/chapter/10.1007/978-3-319-49304-6_12
Extended version can be found at https://arxiv.org/abs/1609.01415
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
Long paper presented during the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016)
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
To ensure that publications are assigned to clusters in a meaningful way, we introduce the notion of stable clusters. Essentially, a cluster is stable if it is insensitive to small changes in the underlying data. Bootstrapping is used to make small changes in the data.
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...Seoul National University
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
January 27 – February 1, 2019, Honolulu, Hawaii, USA.
https://aaai.org/Conferences/AAAI-19/
Navigation through citation network based on content similarity using cosine ...Salam Shah
The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.
Relations play a vital role on knowledge construction
and maintenance thereof. They for example connect domain
type entities to range type entities, like the relation born in
connects some Persons to some Places. Over any dataset, the
domain-range information is used to maintain data consistency.
Therefore, we see that knowledge construction frameworks
sometime engage costly Knowledge Engineers to define the
domain-range information in form of a schema or an ontology.
We also see that frameworks that hold such defined domain-range information, often do not follow them strictly. In the worst case some frameworks do not even allow to define a
domain-range, rather they just gather the knowledge entries.
One reason of not defining the domain-range information is
that it is costly. On the other hand, the reason for not following
the domain-range constraint is that the most of them are either
manual or semi-automatic, therefore they face adaptation
difficulty. In this research, we propose a relation-wise machine
learning model that can define and validate domain-range
information automatically. The initial experiment shows that
the proposed framework performs promisingly.
Presentation made on December 7th 2016 during ICADL'16
Full text can be found at http://link.springer.com/chapter/10.1007/978-3-319-49304-6_12
Extended version can be found at https://arxiv.org/abs/1609.01415
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
Long paper presented during the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016)
There are both challenges and opportunities in the existing scenario characterized by heavy emphasis on collaboration, digitization and onset of social media. One needs to be connected with theme, institution, industry and society. The web 2.0 technologies make it possible for a researcher to be a connected one.
Early Detection of Research Trends [thesis defence]Angelo Salatino
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature presents several approaches to identifying the emergence of new research topics, which rely on the assumption that the topic is already exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this dissertation, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘ancestors’ of the new topic. Based on this understanding, we developed Augur, a novel approach to effectively detecting the emergence of new research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new community detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 timeframe and outperformed four alternative approaches in terms of both precision and recall.
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
Slides of the Lecture at the 5th International School on Applied Probability Theory,Communications Technologies & Data Science (APTCT-2020)
12 Nov 2020
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryAngelo Salatino
Understanding, monitoring, and predicting the flow of knowledge between academia and industry is of critical importance for a variety of stakeholders, including governments, funding bodies, researchers, investors, and companies. To this purpose, we introduce ResearchFlow, an approach that integrates semantic technologies and machine learning to quantifying the diachronic behaviour of research topics across academia and industry. ResearchFlow exploits the novel Academia/Industry DynAmics (AIDA) Knowledge Graph in order to characterize each topic according to the frequency in time of the related i) publications from academia, ii) publications from industry, iii) patents from academia, and iv) patents from industry. This representation is then used to produce several analytics regarding the academia/industry knowledge flow and to forecast the impact of research topics on industry. We applied ResearchFlow to a dataset of 3.5M papers and 2M patents in Computer Science and highlighted several interesting patterns. We found that 89.8% of the topics first emerge in academic publications, which typically precede industrial publications by about 5.6 years and industrial patents by about 6.6 years. However this does not mean that academia always dictates the research agenda. In fact, our analysis also shows that industrial trends tend to influence academia more than academic trends affect industry. We evaluated ResearchFlow on the task of forecasting the impact of research topics on the industrial sector and found that its granular characterization of topics improves significantly the performance with respect to alternative solutions.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
Invited Talk: Early Detection of Research Topics Angelo Salatino
Slides of my talk at Chan Zuckerberg Initiative (Meta)
Abstract:
The ability to promptly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings (i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, (ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also (iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise.
AUGUR: Forecasting the Emergence of New Research TopicsAngelo Salatino
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature pre-sents several approaches to identifying the emergence of new re-search topics, which rely on the assumption that the topic is al-ready exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. We address this issue by introducing Augur, a novel approach to the early detection of research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new communi-ty detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 interval and outperformed four alternative approaches in terms of both precision and recall.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Early Detection and Forecasting of Research Trends
1. Early Detection and Forecasting of
Research Trends
Angelo Antonio Salatino
@angelosalatino
Advisors:
Prof. Enrico Motta
Dr. Francesco Osborne
ISWC 2015 – Doctoral Consortium
3. Who cares?
• Researchers: following the evolution of
the research environment
• Academic publishers: promoting up-to-
date and interesting contents
• Companies: early intelligence on
potentially important research trends to
remain at the forefront of innovation
• Funding bodies: improved understanding
of the research landscape
4. State of the art: Trend detection
• Topic evolution using bibliometric analysis:
– Content analysis
• Topics extraction
• Main terms in documents
– Citation analysis
– Main limitation: cannot detect new trends
early enough in the lifecycle
[Wu et al. 2011, Bolelli et al. 2009, He et al. 2009]
5. State of the art: Forecasting impact
• Impact based on number of publications and
authors associated with topics
• Approaches based on exponential
smoothing, simple medium average and
machine learning
• Limitations:
– These approaches don’t work at embryonic and
early stages
– They only use a limited set of data sources
[Budi al. 2012, Jun et al. 2010, Tseng et al. 2009]
6. Planned approach
Wider range of data sources:
comprehensive knowledge base integrating
both scholarly data and social media
7. Planned approach
– For example, before the Semantic Web
emerged explicitly as research area we
could identify new interesting dynamics
involving authors from different research
areas such as knowledge representation,
agent systems, hypertext and databases.
– Creation of a model that takes into
account all the discovered patterns which
may involve different entities (e.g.,
authors, venues, topics, communities)
Focus on discovering patterns emerging from the
research dynamics:
8. Initial study
• Goal: To identify the dynamics that may
indicate the emergence of a new topic
• Approach:
– Integration of Keywords network and Semantic
topics network (Klink-2, Osborne et al. @ ISWC
2015)
– Analysis of the evolution in time of sub-networks
that will generate new topics vs. a control
group of establish topics.
• Debutant group (new topics)
• Non-debutant group (established topics)
9. Preliminary results
• My analysis indicates that for Debutant Topics there is
an intense activity between the most co-occurring
keywords which would normally be established topics
• My hypothesis is that I can use this understanding for
the early detection of new topics on the basis of the
activity of established topics
Student’s t-test on the two distributions:
• p-value = 2.81*10-83
• null hypothesis can be rejected
10. Evaluation plan
• Quantitative: retrospective analysis and
detection of historical trends
• Qualitative: informal feedback from
domain experts, including senior editors
and publishers at Springer, on the system
suggestions for future trends
11. Reflections
• So far, my initial experiments provided
promising results which confirm the initial
hypotheses
• The adoption of semantic technologies
has been beneficial to improve these
results
12. Next steps
• Analyse dynamics in other networks (e.g.,
authors, communities and venues)
• Integration of social media data
Editor's Notes
Nowadays we are experiencing that the research environment evolves rapidly. New research areas emerge meanwhile others fade out, making difficult to keep up with these dynamics.
At the moment, the task of understanding the main emergent area is accomplished either in an automatic or in a semi-automatic way using systems such as rexplore, saffron, arnetminer, MAS, google scholar, faceted dblp and citeseer.
Taking as an example the evolution in time of a topic based on the number of papers, like for example the semantic web in figure, we can recognize three main stages: embryonic, early stage and recognised.
In fact, it can be argued that a number of topics start to exist in an embryonic way, often as a combination of other topics, before being officially identified and then named by researchers. For example, the Semantic Web emerged as a common area for researchers working on Artificial Intelligence, WWW and Knowledge-Based Systems, before being acknowledged and labelled in the 2001 paper by Tim Berners-Lee.
The early stage phase starts when a group of scientists agree with some theories related to the topic, build their own conceptual framework, and potentially give birth to a new scientific community.
Finally, in the recognized phase, many authors are aware of this topic and then they start to work on it, producing results and then publish research papers.
The problem is that all the aforementioned systems are capable of performing the detection of trends only when the research area is already recognised and not before. They actually need some years to make sense of these new trends. Moreover there are no systems able to forecast their impact in the early stage.
I am interested in identifying, making sense and forecasting the impact of research trends.
Who is really interested?
Well, Researchers need to be updated regularly on the evolution of research environments because they are interested in new trends related to their topics.
Academic publishers or editors knowing in advance new emerging topics is crucial for offering the most up to date and interesting contents. For example, an editor can gain a competitive advantage by being the first one to recognize the importance of a new trend and publish a special issue or a journal about it. And actually my PhD project is supported from Springer-Verlag.
Institutional funding bodies and companies need also to be aware of research developments and promising research trends. For example, being aware of the future research trends will allow them to move in advance for making some important investments.
This problem can be analysed from two point of view that are the topic trend detection and the forecast of the impact of topics.
For what concerns the trend detection, all the current approaches do use bibliometric analysis aiming to extract either topics or main terms from the text and then the evolution of these topics is analysed investigating the citation network.
The main limitation of these approaches is that the content for specific topic need first to be produced and then cited taking years before they can realise it.
On the other hand, for forecasting the impact there are approaches that define the impact as number of publications and authors associated with topics and they are mainly based on statistical techniques like exponential smoothing, simple medium average and also machine learning algorithms.
In this case, the main limitations of these approaches is that they do not work in the first phases of the evolution of topics and also they employ limited set of features.
However it can argued that a different definition of the impact based also on social media data can improve the forecasting phase and will allow us to perform it in a short timescale.
Initially, I will aim to integrate a variety of heterogeneous data sources including scholarly data and social media data in order to create a comprehensive knowledge base. This knowledge based will make use of an ontology to describe all the relationships between the research elements.
Afterwards I will focus on analysing pattern that can lead to the emergence of a new research topic. For example, before Tim Berners Lee named officially the semantic web as a research area, we were already able to identify that the AI, the WWW and KBS were sharing their knowledge in this new common area.
An interesting fact about scholarly data is that they store information about papers, therefore many research elements like topics, author, communities, venues, organizations can be inferred and all these research elements are inherently interconnected because an author writes paper about certain topics, an author belong to a community that is connected to a topic. These relationships can be analysed diachronically to derive new dynamics that can lead to the emergence of new topics, and then I can design a comprehensive model that takes into account all the discovered patterns.
I conducted an initial study aiming to identify the dynamics that may lead to the emergence of a new topic using only scholarly data.
In order to do so, I firstly combined the keywords network and the semantic topic network available in REXPLORE database. The keywords network as the name suggests is a network in which nodes represent keywords tagged in paper and the link between two keywords represent the amount of paper in which these two keyword co-occur per each year. The semantic topic network is also a network of keywords but in this case they are connected by semantic relationships subAreaOf, sameAs and so on that creates then a hierarchy of research topics.
As a next step, I conducted a diachronic analysis on some portion of this joint network that are related to two different kind of topics: debutant and non debutant
As a result I obtained that for the portion of network related to the debutant group of topics the pace of collaboration between topics is higher than the portion of network related to the non-debutant group.
In this picture we can see two different distribution of the pace of collaboration of topics in time. The green line is for the topics belonging to the non debutant group while the blue line is for topics belonging to the debutant group. We can see that the distribution of the pace in collaboration for the non debutant group is centred in zero which means that on overall this group doesn’t show any increase in collaboration, while for the debutant group the distribution is shifted toward positive values showing that in this case the pace of collaboration is increasing.
Moreover, applying the Student’s t-test on the two distributions allows us to reject the null hypothesis indicating that there is no relationship between the two measured phenomena.
For this reason I believe that the acquired know-how can be applied for understanding the emergence of new topics based on the established ones.
As preliminary results, I joined the Keywords Network that is a co-occurrences graphs with nodes representing topics and links representing the number of co-occurrences between them and the Semantic Topic Network that is a taxonomy of topic connected by semantic relationships extracted by Klink.
I conducted a diachronic analysis on some portions of this joined graphs to confirm if the creation of novel topics is actually correlated to an increase in the pace of collaboration of already existing ones. These portions of graph were related to two different groups of topics: debutant and non-debutant.
I plan to evaluate my work on both quantitative and qualitative perspective.
From a quantitative point of view, I will use historical data to estimate statistical indexes like precision, recall, f-measure and so on.
While from the qualitative perspecive, it is intended to receive informal feedback about future trend from domain experts, such as senior editors and publishers at Springer
It can be said that the initial experiments provided promising result confirming also the initial hypotheses about the emergence of new topics.
And, the adoption of semantic technologies like the semantic topic network has been beneficial to improve these results.
As a next step I aim to analyse the dynamics of other research elements like authors, communities and venues that can lead to the emergence of a new research topics and also integrate entities from social media like tweets and blog posts.