SlideShare a Scribd company logo
AUGUR: Forecasting the Emergence of
New Research Topics
Angelo A. Salatino, Francesco Osborne, Enrico Motta
@angelosalatino
Background
Anatomy of a research topic
• Early stage: researchers build their conceptual framework,
establish their community
• Recognised: many researchers work in this topic, start to
produce and disseminate their results
Background
«[…] successive transition from one paradigm to
another via revolution is the usual developmental
pattern of mature science.»
Thomas Kuhn - The Structure of Scientific Revolutions
How research topics are born?
• The fundamental assumption of this research is that it
should be possible to detect the emergence of new
research topics even before they are consistently
labelled by the community
– The approach focuses on uncovering the relevant patterns in the
dynamics of existing topics
Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "How are topics born?
Understanding the research dynamics preceding the emergence of new areas." PeerJ
Computer Science 3 (2017): e119. https://peerj.com/articles/cs-119/
How research topics are born?
The creation of novel topics is anticipated by a significant
increase in the pace of collaboration and density of the
portions of the network in which they will appear.
Forecasting the Emergence of New
Research Topics?
?
Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "How are topics born?
Understanding the research dynamics preceding the emergence of new areas." PeerJ
Computer Science 3 (2017): e119. https://peerj.com/articles/cs-119/
Emerging
Topics
Related Topics
Selection
Data
Analysis
Dynamics
AUGUR
Background data
Scholarly Data
• Dump of Scopus until 2014
• Co-occurrence network
– Nodes: keywords in papers
– Links: number of times two
keywords co-occur together
Computer Science Ontology*
• Large-scale ontology of
research areas automatically
generated using the Klink-2
algorithm**
• Defines when a topic is
broader than another topic
• Defines when two topics
express the same subject of
study
* Computer Science Ontology Portal: https://w3id.org/cso
** Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate
semantic topic networks. In ISWC 2015
Evolutionary Network
Snapshot of the collaborations of topics in a period of five years
, ,
,
ˆ ( , )year year
year
year year
topic topic
u v u vtopic
u v topic topic
v u
w w
w HarmonicMean
p p
=
4
, ,
0
, 4
2
0
ˆ ˆ( )( )
( )
year i
year
topic topic
t i u v u v
evol i
u v
t i
i
year year w w
w
year year
--
=
-
=
- -
=
-
å
å
4
0
4
2
0
( )( )
( )
year i
year
topic topic
t i k k
evol i
k
t i
i
year year p p
p
year year
--
=
-
=
- -
=
-
å
å
!"#$%
= (("#$%
, *"#$%
, +"#$%
, ,"#$%
)
Edge weights
Node weights
Advanced Clique Percolation Method
Clique
detection
Clique-graph
construction
Topic
network
Communities
Clique
detection
Clique-graph
construction
Topic
network
Communities
Measure &
Filtering
Find local
maxima and
select
neighbors
Standard (Palla et al., 2005)
Advanced
An example:
1 connected components
with 4 local maxima
Evolutionary Network of year 2000
CPM ACPM
Post-processing & Sense Making
• Some clusters are filtered and others merged
• Extraction of influential papers
• Extraction of influential authors
A
B
A ∪ B
Final Outcome
Influential Authors
W. Bruce Croft,
Dieter Fensel,
Dan Suciu,
William W. Cohen,
Berthier Ribeiro-Neto,
Clement T. Yu,
James Allan,
Justin Zobel,
Dragomir R. Radev,
Victor Vianu
Influential Papers
- A Sheth et al. "Managing semantic content for the Web" (2002)
- RWP Luk et al. "A survey in indexing and searching XML documents" (2002)
- J Kahan et al. "Annotea: An open RDF infrastructure for shared Web annotations"
(2002)
- R Manmatha et al. "Modeling score distributions for combining the outputs of
search engines" (2001)
- S Dagtas et al. "Models for motion-based video indexing and retrieval" (2000)
Portion of the evolutionary network in 2002, reflecting the emergence
of Semantic Search in 2003
Evaluating
We evaluated AUGUR and the ACPM against a gold
standard of emerging topics and we compared it against
four state-of-the-art algorithms:
• Fast Greedy (FG)
• Leading Eigenvector (LE)
• Fuzzy C-Means (FCM)
• Clique Percolation Method (CPM)
Evaluating
• Gold standard of 1408 emerging topics in 2000-11
• For each emerging topic we extracted 25 ancestors
– Topics that mostly collaborated with the debutant topic during its
first five years of activity
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
#topics 149 194 221 216 137 241 134 60 27 12 12 5
Evaluating
clusters(yt) vs. ancestorsOfDebutantTopics(yt+1, yt+2)
Evaluating
Metrics:
• Precision, number of matching clusters divided by the total
number of clusters
• Recall, number of matching debutant topics divided by the
total number of debutant topics
Evaluating
Matching:
• Jaccard Similarity between Cluster (Ci) with ancestors
of debutant topics (Dk)
• We added the same-as (SAi) of the cluster (Ci)
fuzzy c means ∩ fuzzy c-means (fcm)
(fuzzy c means ∪ fuzzy c-means (fcm) ∪ fuzzy c-means ) ∩ fuzzy c-means (fcm)
0 ≤ )#(%&, (), *+& ≤ 1
Evaluating
Four strategies:
Strategy	(1) ,- ./. 12 (shown	previously)
Strategy	(2) (,- ∪ ?,-) ./. 12
Strategy	(3) ,- ./. (12 ∪ ?12)
Strategy	(4) (,- ∪ ?,-) ./. (12 ∪ ?12)
( )
( ) (
, ,
)
, , i i i k k
i k i i k
i i k k
C EC SA D ED
J C D SA EC ED
C EC D ED
È È Ç È
=
È È È
0 ≤ )B(,C, 1E, FGC, ?,C, ?1E ≤ 1
Evaluating
Degrees of freedom
• Year, [1999-2010]
• Similarity threshold, [0-1]
• Strategy, {Strategy 1, 2, 3, 4}
• Algorithm, {FG, LE, FCM, CPM, ACPM}
precision = f(year, similarity, strategy, algorithm)
recall = f(year, similarity, strategy, algorithm)
Results
Which strategy?
Fixed variables: algorithm(ACPM), year(1999)
(1) $% &'. )*
(2) ($% ∪ -$%) &'. )*
(3) $% &'. ()* ∪ -)*)
(4) ($% ∪ -$%) &'. ()* ∪ -)*)
Results
Which algorithm?
FG LE FCM CPM ACPM
Years Pr Re Pr Re Pr Re Pr Re Pr Re
1999 .27 .11 .00 .00 .00 .00 .06 .01 .86 .76
2000 .21 .07 .14 .02 .96 .01 .05 .00 .78 .70
2001 .13 .04 .11 .01 .00 .00 .17 .00 .77 .72
2002 .14 .04 .11 .01 .00 .00 .29 .01 .82 .80
2003 .09 .02 .20 .02 .00 .00 .08 .02 .83 .79
2004 .11 .05 .06 .00 .00 .00 .00 .00 .84 .68
2005 .07 .11 .06 .01 .00 .00 .00 .00 .71 .66
2006 .01 .01 .07 .01 .00 .00 .00 .00 .43 .51
2007 .01 .08 .00 .00 .00 .00 .00 .00 .28 .44
2008 .01 .04 .00 .00 .00 .00 .00 .00 .15 .33
2009 .00 .00 .00 .00 .00 .00 .00 .00 .09 .76
Fixed variables: similarity(0.1), strategy(4)
Results
Fixed variables: algorithm(ACPM), strategy(4)
Final results
Conclusion
• We evaluated Augur and ACPM versus four alternative
approaches on a gold standard of 1,408 debutant topics
in the 2000-2011 timeframe.
• The results show that our approach outperforms state of
the art solutions and is able to successfully identify
clusters that will produce new topics in the two following
years.
Future work
• Gold Standard
• Scope
• Further dynamics
• Analysis on more recent data
Thank you
SKM3 Scholarly Knowledge: Modelling, Mining and Sense Making
http://skm.kmi.open.ac.uk/
Angelo A. Salatino
email: angelo.salatino@open.ac.uk
twitter: @angelosalatino
Web: salatino.org
Francesco
Osborne
Enrico
Motta
Angelo
Salatino

More Related Content

Similar to AUGUR: Forecasting the Emergence of New Research Topics

Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
Angelo Salatino
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Francesco Osborne
 
Information theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudInformation theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloud
MOVING Project
 
EMPIRICAL PROJECTObjective to help students put in practice w.docx
EMPIRICAL PROJECTObjective  to help students put in practice w.docxEMPIRICAL PROJECTObjective  to help students put in practice w.docx
EMPIRICAL PROJECTObjective to help students put in practice w.docx
SALU18
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
Yoshitaka Ushiku
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Nakul Sharma
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Angelo Salatino
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
Angelo Salatino
 
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
Kent State University
 
Graph-based multimodal clustering for social event detection in large collect...
Graph-based multimodal clustering for social event detection in large collect...Graph-based multimodal clustering for social event detection in large collect...
Graph-based multimodal clustering for social event detection in large collect...
Symeon Papadopoulos
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Lecture_2_Stats.pdf
Lecture_2_Stats.pdfLecture_2_Stats.pdf
Lecture_2_Stats.pdf
paijitk
 
Assessing the Use of Eclipse MDE Technologies in Open-Source Software Projects
Assessing the Use of Eclipse MDE Technologies in Open-Source Software ProjectsAssessing the Use of Eclipse MDE Technologies in Open-Source Software Projects
Assessing the Use of Eclipse MDE Technologies in Open-Source Software Projects
Dimitris Kolovos
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
openCypher
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
Chittagong Independent University
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
multimediaeval
 
14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...
14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...
14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...
Masahiro Kanazaki
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
Rrubaa Panchendrarajan
 
Crops In Silico Workshop, Oxford June 2017
Crops In Silico Workshop, Oxford June 2017Crops In Silico Workshop, Oxford June 2017
Crops In Silico Workshop, Oxford June 2017
University of Illinois at Urbana-Champaign
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social Media
INRIA-OAK
 

Similar to AUGUR: Forecasting the Emergence of New Research Topics (20)

Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 
Information theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudInformation theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloud
 
EMPIRICAL PROJECTObjective to help students put in practice w.docx
EMPIRICAL PROJECTObjective  to help students put in practice w.docxEMPIRICAL PROJECTObjective  to help students put in practice w.docx
EMPIRICAL PROJECTObjective to help students put in practice w.docx
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
 
Graph-based multimodal clustering for social event detection in large collect...
Graph-based multimodal clustering for social event detection in large collect...Graph-based multimodal clustering for social event detection in large collect...
Graph-based multimodal clustering for social event detection in large collect...
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Lecture_2_Stats.pdf
Lecture_2_Stats.pdfLecture_2_Stats.pdf
Lecture_2_Stats.pdf
 
Assessing the Use of Eclipse MDE Technologies in Open-Source Software Projects
Assessing the Use of Eclipse MDE Technologies in Open-Source Software ProjectsAssessing the Use of Eclipse MDE Technologies in Open-Source Software Projects
Assessing the Use of Eclipse MDE Technologies in Open-Source Software Projects
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...
14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...
14/01/20 "Engineering Optimization in Aircraft Design" Aerodynamic Design at ...
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Crops In Silico Workshop, Oxford June 2017
Crops In Silico Workshop, Oxford June 2017Crops In Silico Workshop, Oxford June 2017
Crops In Silico Workshop, Oxford June 2017
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social Media
 

More from Angelo Salatino

Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
Angelo Salatino
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
Angelo Salatino
 
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
Angelo Salatino
 
Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]
Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Early Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsEarly Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research Trends
Angelo Salatino
 
Tesi Triennale Slide
Tesi Triennale SlideTesi Triennale Slide
Tesi Triennale Slide
Angelo Salatino
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
Angelo Salatino
 

More from Angelo Salatino (9)

Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryResearchFlow: Understanding the Knowledge Flow between Academia and Industry
ResearchFlow: Understanding the Knowledge Flow between Academia and Industry
 
Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]Early Detection of Research Trends [thesis defence]
Early Detection of Research Trends [thesis defence]
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Early Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research TrendsEarly Detection and Forecasting of Research Trends
Early Detection and Forecasting of Research Trends
 
Tesi Triennale Slide
Tesi Triennale SlideTesi Triennale Slide
Tesi Triennale Slide
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

AUGUR: Forecasting the Emergence of New Research Topics

  • 1. AUGUR: Forecasting the Emergence of New Research Topics Angelo A. Salatino, Francesco Osborne, Enrico Motta @angelosalatino
  • 2. Background Anatomy of a research topic • Early stage: researchers build their conceptual framework, establish their community • Recognised: many researchers work in this topic, start to produce and disseminate their results
  • 3. Background «[…] successive transition from one paradigm to another via revolution is the usual developmental pattern of mature science.» Thomas Kuhn - The Structure of Scientific Revolutions
  • 4. How research topics are born? • The fundamental assumption of this research is that it should be possible to detect the emergence of new research topics even before they are consistently labelled by the community – The approach focuses on uncovering the relevant patterns in the dynamics of existing topics Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "How are topics born? Understanding the research dynamics preceding the emergence of new areas." PeerJ Computer Science 3 (2017): e119. https://peerj.com/articles/cs-119/
  • 5. How research topics are born? The creation of novel topics is anticipated by a significant increase in the pace of collaboration and density of the portions of the network in which they will appear.
  • 6. Forecasting the Emergence of New Research Topics? ? Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "How are topics born? Understanding the research dynamics preceding the emergence of new areas." PeerJ Computer Science 3 (2017): e119. https://peerj.com/articles/cs-119/ Emerging Topics Related Topics Selection Data Analysis Dynamics
  • 8. Background data Scholarly Data • Dump of Scopus until 2014 • Co-occurrence network – Nodes: keywords in papers – Links: number of times two keywords co-occur together Computer Science Ontology* • Large-scale ontology of research areas automatically generated using the Klink-2 algorithm** • Defines when a topic is broader than another topic • Defines when two topics express the same subject of study * Computer Science Ontology Portal: https://w3id.org/cso ** Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In ISWC 2015
  • 9. Evolutionary Network Snapshot of the collaborations of topics in a period of five years , , , ˆ ( , )year year year year year topic topic u v u vtopic u v topic topic v u w w w HarmonicMean p p = 4 , , 0 , 4 2 0 ˆ ˆ( )( ) ( ) year i year topic topic t i u v u v evol i u v t i i year year w w w year year -- = - = - - = - å å 4 0 4 2 0 ( )( ) ( ) year i year topic topic t i k k evol i k t i i year year p p p year year -- = - = - - = - å å !"#$% = (("#$% , *"#$% , +"#$% , ,"#$% ) Edge weights Node weights
  • 10. Advanced Clique Percolation Method Clique detection Clique-graph construction Topic network Communities Clique detection Clique-graph construction Topic network Communities Measure & Filtering Find local maxima and select neighbors Standard (Palla et al., 2005) Advanced An example: 1 connected components with 4 local maxima
  • 11. Evolutionary Network of year 2000 CPM ACPM
  • 12. Post-processing & Sense Making • Some clusters are filtered and others merged • Extraction of influential papers • Extraction of influential authors A B A ∪ B
  • 13. Final Outcome Influential Authors W. Bruce Croft, Dieter Fensel, Dan Suciu, William W. Cohen, Berthier Ribeiro-Neto, Clement T. Yu, James Allan, Justin Zobel, Dragomir R. Radev, Victor Vianu Influential Papers - A Sheth et al. "Managing semantic content for the Web" (2002) - RWP Luk et al. "A survey in indexing and searching XML documents" (2002) - J Kahan et al. "Annotea: An open RDF infrastructure for shared Web annotations" (2002) - R Manmatha et al. "Modeling score distributions for combining the outputs of search engines" (2001) - S Dagtas et al. "Models for motion-based video indexing and retrieval" (2000) Portion of the evolutionary network in 2002, reflecting the emergence of Semantic Search in 2003
  • 14. Evaluating We evaluated AUGUR and the ACPM against a gold standard of emerging topics and we compared it against four state-of-the-art algorithms: • Fast Greedy (FG) • Leading Eigenvector (LE) • Fuzzy C-Means (FCM) • Clique Percolation Method (CPM)
  • 15. Evaluating • Gold standard of 1408 emerging topics in 2000-11 • For each emerging topic we extracted 25 ancestors – Topics that mostly collaborated with the debutant topic during its first five years of activity Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 #topics 149 194 221 216 137 241 134 60 27 12 12 5
  • 17. Evaluating Metrics: • Precision, number of matching clusters divided by the total number of clusters • Recall, number of matching debutant topics divided by the total number of debutant topics
  • 18. Evaluating Matching: • Jaccard Similarity between Cluster (Ci) with ancestors of debutant topics (Dk) • We added the same-as (SAi) of the cluster (Ci) fuzzy c means ∩ fuzzy c-means (fcm) (fuzzy c means ∪ fuzzy c-means (fcm) ∪ fuzzy c-means ) ∩ fuzzy c-means (fcm) 0 ≤ )#(%&, (), *+& ≤ 1
  • 19. Evaluating Four strategies: Strategy (1) ,- ./. 12 (shown previously) Strategy (2) (,- ∪ ?,-) ./. 12 Strategy (3) ,- ./. (12 ∪ ?12) Strategy (4) (,- ∪ ?,-) ./. (12 ∪ ?12) ( ) ( ) ( , , ) , , i i i k k i k i i k i i k k C EC SA D ED J C D SA EC ED C EC D ED È È Ç È = È È È 0 ≤ )B(,C, 1E, FGC, ?,C, ?1E ≤ 1
  • 20. Evaluating Degrees of freedom • Year, [1999-2010] • Similarity threshold, [0-1] • Strategy, {Strategy 1, 2, 3, 4} • Algorithm, {FG, LE, FCM, CPM, ACPM} precision = f(year, similarity, strategy, algorithm) recall = f(year, similarity, strategy, algorithm)
  • 21. Results Which strategy? Fixed variables: algorithm(ACPM), year(1999) (1) $% &'. )* (2) ($% ∪ -$%) &'. )* (3) $% &'. ()* ∪ -)*) (4) ($% ∪ -$%) &'. ()* ∪ -)*)
  • 22. Results Which algorithm? FG LE FCM CPM ACPM Years Pr Re Pr Re Pr Re Pr Re Pr Re 1999 .27 .11 .00 .00 .00 .00 .06 .01 .86 .76 2000 .21 .07 .14 .02 .96 .01 .05 .00 .78 .70 2001 .13 .04 .11 .01 .00 .00 .17 .00 .77 .72 2002 .14 .04 .11 .01 .00 .00 .29 .01 .82 .80 2003 .09 .02 .20 .02 .00 .00 .08 .02 .83 .79 2004 .11 .05 .06 .00 .00 .00 .00 .00 .84 .68 2005 .07 .11 .06 .01 .00 .00 .00 .00 .71 .66 2006 .01 .01 .07 .01 .00 .00 .00 .00 .43 .51 2007 .01 .08 .00 .00 .00 .00 .00 .00 .28 .44 2008 .01 .04 .00 .00 .00 .00 .00 .00 .15 .33 2009 .00 .00 .00 .00 .00 .00 .00 .00 .09 .76 Fixed variables: similarity(0.1), strategy(4)
  • 23. Results Fixed variables: algorithm(ACPM), strategy(4) Final results
  • 24. Conclusion • We evaluated Augur and ACPM versus four alternative approaches on a gold standard of 1,408 debutant topics in the 2000-2011 timeframe. • The results show that our approach outperforms state of the art solutions and is able to successfully identify clusters that will produce new topics in the two following years.
  • 25. Future work • Gold Standard • Scope • Further dynamics • Analysis on more recent data
  • 26. Thank you SKM3 Scholarly Knowledge: Modelling, Mining and Sense Making http://skm.kmi.open.ac.uk/ Angelo A. Salatino email: angelo.salatino@open.ac.uk twitter: @angelosalatino Web: salatino.org Francesco Osborne Enrico Motta Angelo Salatino