Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature pre-sents several approaches to identifying the emergence of new re-search topics, which rely on the assumption that the topic is al-ready exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. We address this issue by introducing Augur, a novel approach to the early detection of research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new communi-ty detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 interval and outperformed four alternative approaches in terms of both precision and recall.
The rise of the digital economy could open a range of new opportunities for firms to play a more active role in global value chains (GVCs).
New digital technologies are radically changing the outlook of manufacturing and services industries by altering the way how companies organize their production processes and which business models they adopt.
How the digitalization is affecting, or could affect future, enterprises (actors) contributions to GVCs.
The various opportunities that the digital economy opens for actors, especially in terms of cost reductions and the emergence of new business models, but also discusses policy measures that could be taken to promote actors participation in GVCs.
Significant challenges remain for SMEs to enter GVCs, some of which are exacerbated by the new digital economy.
Application of bradford's law of scattering to the literature of stellar Ghouse Modin Mamdapur
The present paper tests one of the important bibliometric laws of Bradford's Law of scattering for the literature related to ‘stellar physics’ for the period 1988–2013 as available in the Web of Science Core Collection database. A total of 2738 articles related to Stellar Physics published in journals in English language during the study period are retrieved. Data are analysed with respect to year-wise growth of articles, relative growth rate and doubling time of literature. The 2738 articles are scattered in 188 journals. A list of ranked journals was prepared and it was found that the Astrophysical Journal with 895 articles is the most productive journal publishing Stellar Physics literature followed by Monthly Notices of the Royal Astronomical Society with 507 articles and Astronomy and Astrophysics with 380 articles. In this study, theoretical aspects of Bradford's Law of Scattering are tested and found that the data do not fit to the present sample. The Leimkuhler model is tested and found to fit the data for the Bradford Multiplier (k) at 11.65. The Bradford law is also tested through graphical formulation by drawing the Bradford bibliograph and is found to confirm all the three characteristics.
Paper Presented during International Conference on What’s next in libraries? Trends, Space, and partnerships held during January 21-23, 2015 at NIT Silchar, Assam. It is being jointly organized by NIT Silchar, in association with its USA partner the Mortenson Center for International Library Programs, University of Illinois at Urbana-Champaign.
The rise of the digital economy could open a range of new opportunities for firms to play a more active role in global value chains (GVCs).
New digital technologies are radically changing the outlook of manufacturing and services industries by altering the way how companies organize their production processes and which business models they adopt.
How the digitalization is affecting, or could affect future, enterprises (actors) contributions to GVCs.
The various opportunities that the digital economy opens for actors, especially in terms of cost reductions and the emergence of new business models, but also discusses policy measures that could be taken to promote actors participation in GVCs.
Significant challenges remain for SMEs to enter GVCs, some of which are exacerbated by the new digital economy.
Application of bradford's law of scattering to the literature of stellar Ghouse Modin Mamdapur
The present paper tests one of the important bibliometric laws of Bradford's Law of scattering for the literature related to ‘stellar physics’ for the period 1988–2013 as available in the Web of Science Core Collection database. A total of 2738 articles related to Stellar Physics published in journals in English language during the study period are retrieved. Data are analysed with respect to year-wise growth of articles, relative growth rate and doubling time of literature. The 2738 articles are scattered in 188 journals. A list of ranked journals was prepared and it was found that the Astrophysical Journal with 895 articles is the most productive journal publishing Stellar Physics literature followed by Monthly Notices of the Royal Astronomical Society with 507 articles and Astronomy and Astrophysics with 380 articles. In this study, theoretical aspects of Bradford's Law of Scattering are tested and found that the data do not fit to the present sample. The Leimkuhler model is tested and found to fit the data for the Bradford Multiplier (k) at 11.65. The Bradford law is also tested through graphical formulation by drawing the Bradford bibliograph and is found to confirm all the three characteristics.
Paper Presented during International Conference on What’s next in libraries? Trends, Space, and partnerships held during January 21-23, 2015 at NIT Silchar, Assam. It is being jointly organized by NIT Silchar, in association with its USA partner the Mortenson Center for International Library Programs, University of Illinois at Urbana-Champaign.
Invited Talk: Early Detection of Research Topics Angelo Salatino
Slides of my talk at Chan Zuckerberg Initiative (Meta)
Abstract:
The ability to promptly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings (i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, (ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also (iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
Information theoritic analysis of entity dynamics on the linked open data cloudMOVING Project
The Linked Open Data (LOD) cloud is expanding continuously. Entities appear, change, and disappear over time. However, relatively little is known about the dynamics of the entities, i. e., the characteristics of their temporal evolution. In this paper, we employ clustering techniques over the dynamics of entities to determine common temporal patterns. We define an entity as RDF resource together with its attached RDF types and properties. The quality of the clusterings is evaluated using entity features such as the entities' properties,RDF types, and pay-level domain. In addition, we investigate to what extend entities that share a feature value change together over time. As dataset, we use weekly LOD snapshots over a period of more than three years provided by the Dynamic Linked Data Observatory. Insights into the dynamics of entities on the LOD cloud has strong practical implications to any application requiring fresh caches of LOD. The range of applications is from determining crawling strategies for LOD, caching SPARQL queries, to programming against LOD, and recommending vocabularies for reusing LOD vocabularies.
EMPIRICAL PROJECTObjective to help students put in practice w.docxSALU18
EMPIRICAL PROJECT
Objective: * to help students put in practice what they have learned in Econometrics I
* to teach students how to write an “economic paper”.
Steps
a) Selecting a topic
Topic areas: Macroeconomics: consumption function, investment function, demand
function, the Phillips curve…
Microeconomics: estimating production, cost, supply and demand. Data
are hard to obtain here.
Urban and Regional Economics: demand for housing, transportation…
International Economics: estimating import and export functions,
estimating purchasing power parity, estimating capital mobility…
Development Economics: measuring the determinants of per-capita
income, testing the per-capita output convergence among nations…
Labor Economics: testing theories of unionization, estimating labor force
participation, estimating wage differential among women, minorities…
Resource and Environmental Economics: estimating water pollution,
estimating the determinants of toxic emissions…
The resource journal is JEL (Journal of Economic Literature) + Internet EconLit .
b) Statement of the Problem
State clearly the problem that you are interested in (what are you trying
to achieve)
c) Review of literature
Point out (critically) what others have done concerning the topic of interest.
d) Formulation of a general model
The final model can be derived in several ways: utility maximization,
profit maximization, cost minimization, etc. The review of literature is
generally helpful to accomplish this task. In the course of deriving the model,
one must sort out clearly the dependent variable and the independent
variables. After transforming the economic model in econometric model, one
writes up the hypotheses to be tested: expected signs of the parameters and
magnitudes. To elaborate a bit, let use the following demand for some good:
Q
P
P
Y
u
be
be
o
=
+
+
+
+
a
b
g
d
where
Q
P
P
Y
and
u
be
be
o
,
,
,
represent the quantity of good of interest, the price
of that good, the price of another good (pork, etc), income and the error term,
respectively. Here
b
g
<
<>
0
0
,
depending on the nature of the good: >0
if substitute and <0 if complementary. The size of
b
depends on the nature of
product. Thus if the product is a necessity, price and income elasticities are
expected to be small.
e) Collecting Data
Sources: international, national, regional
primary or secondary.
Notes.
f) Empirical Analysis
Data analysis: outliers, level of variation…
Model estimation and hypothesis testing
g) Writing a Report
Statement of the problem: describe the problem you have studied,
the questi ...
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
Being aware of new research topics is an important asset for anybody involved in the research environment, including researchers, academic publishers and institutional funding bodies. In recent years, the amount of scholarly data available on the web has increased steadily, allowing the development of several approaches for detecting emerging research topics and assessing their trends. However, current methods focus on the detection of topics which are already associated with a label or a substantial number of documents. In this paper, we address instead the issue of detecting embryonic topics, which do not possess these characteristics yet. We suggest that it is possible to forecast the emergence of novel research topics even at such early stage and demonstrate that the emergence of a new topic can be anticipated by analysing the dynamics of pre-existing topics. We present an approach to evaluate such dynamics and an experiment on a sample of 3 million research papers, which confirms our hypothesis. In particular, we found that the pace of collaboration in sub-graphs of topics that will give rise to novel topics is significantly higher than the one in the control group.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...Kent State University
Concepts in web ontologies help machines to un-
derstand data through the meanings they hold. Furthermore,
learning contexts and topics of web documents also have helped
in better semantic-oriented structuring and retrieval of data on
the web. In this short paper we present a novel approach for
domain-independent open learning of domain concepts, context
and topic of any given web document. Our approach is based on a
computational version of the Construction-Integration (CI) model
of text comprehension. Our proposed system mimics the way
humans learn the meanings of textual units and identify domain
concepts, contexts and topics in the form of semantic networks.
We apply our system on a number of web documents with a
range of topics and domains. The resulting semantic networks
provide a quantitative and qualitative insights into the nature of
the given web documents.
A Network-Aware Approach for Searching As-You-Type in Social MediaINRIA-OAK
We present a novel approach for as-you-type top-k keyword search over social media. We adopt a natural "network-aware" interpretation for information relevance, by which information produced by users who are closer to the seeker is considered more relevant. In practice, this query model poses new challenges for effectiveness and efficiency in online search, even when a complete query is given as input in one keystroke. This is mainly because it requires a joint exploration of the social space and classic IR indexes such as inverted lists. We describe a memory-efficient and incremental prefix-based retrieval algorithm, which also exhibits an anytime behavior, allowing to output the most likely answer within any chosen running-time limit. We evaluate it through extensive experiments for several applications and search scenarios , including searching for posts in micro-blogging (Twitter and Tumblr), as well as searching for businesses based on reviews in Yelp. They show that our solution is effective in answering real-time as-you-type searches over social media.
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
Slides of the Lecture at the 5th International School on Applied Probability Theory,Communications Technologies & Data Science (APTCT-2020)
12 Nov 2020
More Related Content
Similar to AUGUR: Forecasting the Emergence of New Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
Slides of my talk at Chan Zuckerberg Initiative (Meta)
Abstract:
The ability to promptly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings (i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, (ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also (iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
Information theoritic analysis of entity dynamics on the linked open data cloudMOVING Project
The Linked Open Data (LOD) cloud is expanding continuously. Entities appear, change, and disappear over time. However, relatively little is known about the dynamics of the entities, i. e., the characteristics of their temporal evolution. In this paper, we employ clustering techniques over the dynamics of entities to determine common temporal patterns. We define an entity as RDF resource together with its attached RDF types and properties. The quality of the clusterings is evaluated using entity features such as the entities' properties,RDF types, and pay-level domain. In addition, we investigate to what extend entities that share a feature value change together over time. As dataset, we use weekly LOD snapshots over a period of more than three years provided by the Dynamic Linked Data Observatory. Insights into the dynamics of entities on the LOD cloud has strong practical implications to any application requiring fresh caches of LOD. The range of applications is from determining crawling strategies for LOD, caching SPARQL queries, to programming against LOD, and recommending vocabularies for reusing LOD vocabularies.
EMPIRICAL PROJECTObjective to help students put in practice w.docxSALU18
EMPIRICAL PROJECT
Objective: * to help students put in practice what they have learned in Econometrics I
* to teach students how to write an “economic paper”.
Steps
a) Selecting a topic
Topic areas: Macroeconomics: consumption function, investment function, demand
function, the Phillips curve…
Microeconomics: estimating production, cost, supply and demand. Data
are hard to obtain here.
Urban and Regional Economics: demand for housing, transportation…
International Economics: estimating import and export functions,
estimating purchasing power parity, estimating capital mobility…
Development Economics: measuring the determinants of per-capita
income, testing the per-capita output convergence among nations…
Labor Economics: testing theories of unionization, estimating labor force
participation, estimating wage differential among women, minorities…
Resource and Environmental Economics: estimating water pollution,
estimating the determinants of toxic emissions…
The resource journal is JEL (Journal of Economic Literature) + Internet EconLit .
b) Statement of the Problem
State clearly the problem that you are interested in (what are you trying
to achieve)
c) Review of literature
Point out (critically) what others have done concerning the topic of interest.
d) Formulation of a general model
The final model can be derived in several ways: utility maximization,
profit maximization, cost minimization, etc. The review of literature is
generally helpful to accomplish this task. In the course of deriving the model,
one must sort out clearly the dependent variable and the independent
variables. After transforming the economic model in econometric model, one
writes up the hypotheses to be tested: expected signs of the parameters and
magnitudes. To elaborate a bit, let use the following demand for some good:
Q
P
P
Y
u
be
be
o
=
+
+
+
+
a
b
g
d
where
Q
P
P
Y
and
u
be
be
o
,
,
,
represent the quantity of good of interest, the price
of that good, the price of another good (pork, etc), income and the error term,
respectively. Here
b
g
<
<>
0
0
,
depending on the nature of the good: >0
if substitute and <0 if complementary. The size of
b
depends on the nature of
product. Thus if the product is a necessity, price and income elasticities are
expected to be small.
e) Collecting Data
Sources: international, national, regional
primary or secondary.
Notes.
f) Empirical Analysis
Data analysis: outliers, level of variation…
Model estimation and hypothesis testing
g) Writing a Report
Statement of the problem: describe the problem you have studied,
the questi ...
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
Being aware of new research topics is an important asset for anybody involved in the research environment, including researchers, academic publishers and institutional funding bodies. In recent years, the amount of scholarly data available on the web has increased steadily, allowing the development of several approaches for detecting emerging research topics and assessing their trends. However, current methods focus on the detection of topics which are already associated with a label or a substantial number of documents. In this paper, we address instead the issue of detecting embryonic topics, which do not possess these characteristics yet. We suggest that it is possible to forecast the emergence of novel research topics even at such early stage and demonstrate that the emergence of a new topic can be anticipated by analysing the dynamics of pre-existing topics. We present an approach to evaluate such dynamics and an experiment on a sample of 3 million research papers, which confirms our hypothesis. In particular, we found that the pace of collaboration in sub-graphs of topics that will give rise to novel topics is significantly higher than the one in the control group.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...Kent State University
Concepts in web ontologies help machines to un-
derstand data through the meanings they hold. Furthermore,
learning contexts and topics of web documents also have helped
in better semantic-oriented structuring and retrieval of data on
the web. In this short paper we present a novel approach for
domain-independent open learning of domain concepts, context
and topic of any given web document. Our approach is based on a
computational version of the Construction-Integration (CI) model
of text comprehension. Our proposed system mimics the way
humans learn the meanings of textual units and identify domain
concepts, contexts and topics in the form of semantic networks.
We apply our system on a number of web documents with a
range of topics and domains. The resulting semantic networks
provide a quantitative and qualitative insights into the nature of
the given web documents.
A Network-Aware Approach for Searching As-You-Type in Social MediaINRIA-OAK
We present a novel approach for as-you-type top-k keyword search over social media. We adopt a natural "network-aware" interpretation for information relevance, by which information produced by users who are closer to the seeker is considered more relevant. In practice, this query model poses new challenges for effectiveness and efficiency in online search, even when a complete query is given as input in one keystroke. This is mainly because it requires a joint exploration of the social space and classic IR indexes such as inverted lists. We describe a memory-efficient and incremental prefix-based retrieval algorithm, which also exhibits an anytime behavior, allowing to output the most likely answer within any chosen running-time limit. We evaluate it through extensive experiments for several applications and search scenarios , including searching for posts in micro-blogging (Twitter and Tumblr), as well as searching for businesses based on reviews in Yelp. They show that our solution is effective in answering real-time as-you-type searches over social media.
Similar to AUGUR: Forecasting the Emergence of New Research Topics (20)
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
Slides of the Lecture at the 5th International School on Applied Probability Theory,Communications Technologies & Data Science (APTCT-2020)
12 Nov 2020
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryAngelo Salatino
Understanding, monitoring, and predicting the flow of knowledge between academia and industry is of critical importance for a variety of stakeholders, including governments, funding bodies, researchers, investors, and companies. To this purpose, we introduce ResearchFlow, an approach that integrates semantic technologies and machine learning to quantifying the diachronic behaviour of research topics across academia and industry. ResearchFlow exploits the novel Academia/Industry DynAmics (AIDA) Knowledge Graph in order to characterize each topic according to the frequency in time of the related i) publications from academia, ii) publications from industry, iii) patents from academia, and iv) patents from industry. This representation is then used to produce several analytics regarding the academia/industry knowledge flow and to forecast the impact of research topics on industry. We applied ResearchFlow to a dataset of 3.5M papers and 2M patents in Computer Science and highlighted several interesting patterns. We found that 89.8% of the topics first emerge in academic publications, which typically precede industrial publications by about 5.6 years and industrial patents by about 6.6 years. However this does not mean that academia always dictates the research agenda. In fact, our analysis also shows that industrial trends tend to influence academia more than academic trends affect industry. We evaluated ResearchFlow on the task of forecasting the impact of research topics on the industrial sector and found that its granular characterization of topics improves significantly the performance with respect to alternative solutions.
Early Detection of Research Trends [thesis defence]Angelo Salatino
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature presents several approaches to identifying the emergence of new research topics, which rely on the assumption that the topic is already exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this dissertation, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘ancestors’ of the new topic. Based on this understanding, we developed Augur, a novel approach to effectively detecting the emergence of new research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new community detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 timeframe and outperformed four alternative approaches in terms of both precision and recall.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
AUGUR: Forecasting the Emergence of New Research Topics
1. AUGUR: Forecasting the Emergence of
New Research Topics
Angelo A. Salatino, Francesco Osborne, Enrico Motta
@angelosalatino
2. Background
Anatomy of a research topic
• Early stage: researchers build their conceptual framework,
establish their community
• Recognised: many researchers work in this topic, start to
produce and disseminate their results
3. Background
«[…] successive transition from one paradigm to
another via revolution is the usual developmental
pattern of mature science.»
Thomas Kuhn - The Structure of Scientific Revolutions
4. How research topics are born?
• The fundamental assumption of this research is that it
should be possible to detect the emergence of new
research topics even before they are consistently
labelled by the community
– The approach focuses on uncovering the relevant patterns in the
dynamics of existing topics
Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "How are topics born?
Understanding the research dynamics preceding the emergence of new areas." PeerJ
Computer Science 3 (2017): e119. https://peerj.com/articles/cs-119/
5. How research topics are born?
The creation of novel topics is anticipated by a significant
increase in the pace of collaboration and density of the
portions of the network in which they will appear.
6. Forecasting the Emergence of New
Research Topics?
?
Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "How are topics born?
Understanding the research dynamics preceding the emergence of new areas." PeerJ
Computer Science 3 (2017): e119. https://peerj.com/articles/cs-119/
Emerging
Topics
Related Topics
Selection
Data
Analysis
Dynamics
8. Background data
Scholarly Data
• Dump of Scopus until 2014
• Co-occurrence network
– Nodes: keywords in papers
– Links: number of times two
keywords co-occur together
Computer Science Ontology*
• Large-scale ontology of
research areas automatically
generated using the Klink-2
algorithm**
• Defines when a topic is
broader than another topic
• Defines when two topics
express the same subject of
study
* Computer Science Ontology Portal: https://w3id.org/cso
** Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate
semantic topic networks. In ISWC 2015
9. Evolutionary Network
Snapshot of the collaborations of topics in a period of five years
, ,
,
ˆ ( , )year year
year
year year
topic topic
u v u vtopic
u v topic topic
v u
w w
w HarmonicMean
p p
=
4
, ,
0
, 4
2
0
ˆ ˆ( )( )
( )
year i
year
topic topic
t i u v u v
evol i
u v
t i
i
year year w w
w
year year
--
=
-
=
- -
=
-
å
å
4
0
4
2
0
( )( )
( )
year i
year
topic topic
t i k k
evol i
k
t i
i
year year p p
p
year year
--
=
-
=
- -
=
-
å
å
!"#$%
= (("#$%
, *"#$%
, +"#$%
, ,"#$%
)
Edge weights
Node weights
10. Advanced Clique Percolation Method
Clique
detection
Clique-graph
construction
Topic
network
Communities
Clique
detection
Clique-graph
construction
Topic
network
Communities
Measure &
Filtering
Find local
maxima and
select
neighbors
Standard (Palla et al., 2005)
Advanced
An example:
1 connected components
with 4 local maxima
12. Post-processing & Sense Making
• Some clusters are filtered and others merged
• Extraction of influential papers
• Extraction of influential authors
A
B
A ∪ B
13. Final Outcome
Influential Authors
W. Bruce Croft,
Dieter Fensel,
Dan Suciu,
William W. Cohen,
Berthier Ribeiro-Neto,
Clement T. Yu,
James Allan,
Justin Zobel,
Dragomir R. Radev,
Victor Vianu
Influential Papers
- A Sheth et al. "Managing semantic content for the Web" (2002)
- RWP Luk et al. "A survey in indexing and searching XML documents" (2002)
- J Kahan et al. "Annotea: An open RDF infrastructure for shared Web annotations"
(2002)
- R Manmatha et al. "Modeling score distributions for combining the outputs of
search engines" (2001)
- S Dagtas et al. "Models for motion-based video indexing and retrieval" (2000)
Portion of the evolutionary network in 2002, reflecting the emergence
of Semantic Search in 2003
14. Evaluating
We evaluated AUGUR and the ACPM against a gold
standard of emerging topics and we compared it against
four state-of-the-art algorithms:
• Fast Greedy (FG)
• Leading Eigenvector (LE)
• Fuzzy C-Means (FCM)
• Clique Percolation Method (CPM)
15. Evaluating
• Gold standard of 1408 emerging topics in 2000-11
• For each emerging topic we extracted 25 ancestors
– Topics that mostly collaborated with the debutant topic during its
first five years of activity
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
#topics 149 194 221 216 137 241 134 60 27 12 12 5
17. Evaluating
Metrics:
• Precision, number of matching clusters divided by the total
number of clusters
• Recall, number of matching debutant topics divided by the
total number of debutant topics
18. Evaluating
Matching:
• Jaccard Similarity between Cluster (Ci) with ancestors
of debutant topics (Dk)
• We added the same-as (SAi) of the cluster (Ci)
fuzzy c means ∩ fuzzy c-means (fcm)
(fuzzy c means ∪ fuzzy c-means (fcm) ∪ fuzzy c-means ) ∩ fuzzy c-means (fcm)
0 ≤ )#(%&, (), *+& ≤ 1
19. Evaluating
Four strategies:
Strategy (1) ,- ./. 12 (shown previously)
Strategy (2) (,- ∪ ?,-) ./. 12
Strategy (3) ,- ./. (12 ∪ ?12)
Strategy (4) (,- ∪ ?,-) ./. (12 ∪ ?12)
( )
( ) (
, ,
)
, , i i i k k
i k i i k
i i k k
C EC SA D ED
J C D SA EC ED
C EC D ED
È È Ç È
=
È È È
0 ≤ )B(,C, 1E, FGC, ?,C, ?1E ≤ 1
24. Conclusion
• We evaluated Augur and ACPM versus four alternative
approaches on a gold standard of 1,408 debutant topics
in the 2000-2011 timeframe.
• The results show that our approach outperforms state of
the art solutions and is able to successfully identify
clusters that will produce new topics in the two following
years.
25. Future work
• Gold Standard
• Scope
• Further dynamics
• Analysis on more recent data
26. Thank you
SKM3 Scholarly Knowledge: Modelling, Mining and Sense Making
http://skm.kmi.open.ac.uk/
Angelo A. Salatino
email: angelo.salatino@open.ac.uk
twitter: @angelosalatino
Web: salatino.org
Francesco
Osborne
Enrico
Motta
Angelo
Salatino