2. c
Literature using Citation Networks
Tobias Kuhn
http://www.tkuhn.ch
@txkuhn
ETH Zurich
Colloquium
Institute of Computational Linguistics
University of Zurich
25 November 2014
3. Reference
Journal article on the content of this talk:
Tobias Kuhn, Matjaz Perc, and Dirk Helbing. Inheritance patterns in
citation networks reveal scienti
4. c memes. Physical Review X, 4,
041036, 21 November 2014. https://journals.aps.org/prx/
abstract/10.1103/PhysRevX.4.041036
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
6. Meme Detection
I am presenting an approach on meme detection", which is related
to a number of existing problems and approaches:
Named-entity extraction
Keyphrase extraction
Topic modeling
Terminology extraction
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
8. Context for NLP
Most NLP approaches focus on the analysis of the texts themselves:
Grammar
Morphology
Text Structure
Statistical Patterns
Some also take the contexts of the texts into account:
Comparison to properties of entire corpus (e.g. tf{idf)
Training on particular corpus/domain/speaker
Citation graph of scienti
15. c Publications
Nodes: publications
Edges: citations (in gray)
Legend:
Natural/Agricultural Sciences
(except Physical Sciences)
Physical Sciences
Engineering and Technology
Medical and Health Sciences
Social Sciences / Humanities
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
18. c Publications
Nodes: publications
Edges: citations (in gray)
Legend:
Natural/Agricultural Sciences
(except Physical Sciences)
Physical Sciences
Engineering and Technology
Medical and Health Sciences
Social Sciences / Humanities
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
21. c Publications
Entire giant component (33
million nodes) of the citation
graph of Thomson Reuter's
Web of Science dataset.
Legend:
Natural/Agricultural Sciences
(except Physical Sciences)
Physical Sciences
Engineering and Technology
Medical and Health Sciences
Social Sciences / Humanities
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
32. c Memes
Meme was coined by Richard Dawkins:
Just as genes propagate themselves in the gene pool by leaping from body
to body via sperm or eggs, so memes propagate themselves in the meme pool
by leaping from brain to brain via a process which, in the broad sense, can
be called imitation. [Dawkins, The Sel
33. sh Gene]
Examples of memes:
Melodies
Recipes
Cultural habits
Words, grammar rules, text style
Scienti
34. c concepts
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
37. nition of Gene:
I am using the word gene to mean a genetic unit that is small enough to last
for a number of generations and to be distributed around in many copies.
[Dawkins, The Sel
41. c meme is a short unit of text in a publication that is replicated in
citing publications and thereby distributed around in many copies.
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
44. es the degree to which a meme's
occurrence aligns with the citation graph:
Pm =
sticking factor
sparking factor
=
?
,
?
=
dm!m
d!m
dm!m
d!m
To prevent that some infrequent phrases get a high propagation score
by chance, we can add small amount of controlled noise (we use
= 3):
Pm =
dm!m
d!m +
dm!m +
d!m +
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
48. Meme Score
Meme score M as the Product of relative frequency f and
propagation score P:
Mm = fmPm
Top 20 Memes for APS (Physics):
1. loop quantum cosmology+* 11. dark energy+*
2. unparticle+* 12. Rashba
3. sonoluminescence+* 13. CuGeO3
+
4. MgB2
+ 14. strange nonchaotic
5. stochastic resonance+* 15. in NbSe3
6. carbon nanotubes+* 16. spin Hall+
7. NbSe3
+ 17. elliptic
ow+*
8. black hole+* 18. quantum Hall+*
9. nanotubes+ 19. CeCoIn5
+
10. lattice Boltzmann+* 20. in
ation+
+ annotators agreed that this is an interesting and important physics concept
* also found on the list of terms extracted from Wikipedia
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
50. Properties of the Meme Score
The meme score has a number of nice properties:
Can be calculated eciently and exhaustively even on very large
dataset
No upper limit on the length of n-grams
No dependence on external linguistic or ontological knowledge
No stop-word lists or other kinds of arbitrary
51. lters or thresholds
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
53. Manual Annotation
Two annotators (A1, A2): PhD students with physics degree
Annotation with respect to (1) physics concept or not and (2)
linguistic category
Randomly extracted phrases for comparison
physics concept not a physics concept
noun phrase verb adjective or adverb other
meme score
A1
A2
A1
A2
random
A1
A2
A1
A2
weighted random
30 60 90 120 150
terms
A1
A2
A1
A2
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
55. Comparison to Alternative Metrics
100
80
60
40
0
1
2
3 0 0.1 0.2 0.3 0.4 0.5
max. relative
difference
across journals
max. absolute
difference
across journals
max. relative
change
over time
max. absolute
change
over time
frequency
meme score
A (area under curve)
10
10
10
20
top x terms by meme score
percentage of Wikipedia terms
40% of top 50
terms are found
on Wikipedia list
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
61. Conclusions
The citation graph is a very powerful resource to detect memes.
Combined with other existing approaches, this seems to be a
promising tool for NLP on scienti
62. c publications.
Could be applied to other types of texts that have a certain kind of
citation structure (legal texts?).
Allows for studying memes in an exhaustive manner.
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
68. Meme Score Calculation
1 Collect all phrases that stick at least once (not counting
free-riding on larger memes)
2 Calculate sticking and sparking factors for all collected phrases
Mm = fmPm with Pm =
sticking factor
sparking factor
=
dm!m
d!m +
,
d
m!m
+
d!m
+
!
Example
Citing title:
covariant eective action for loop quantum cosmology from order reduction
Cited titles:
{ quantum nature of the big bang
{ absence of a singularity in loop quantum cosmology
{ large scale eective theory for cosmological bounces
Sticking phrases: loop quantum cosmology, quantum, eective, for
Sparking phrases: covariant, covariant eective action, order reduction, ...
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti