Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in
Scientific Literature
Tobias Kuhn and Matjaz Perc and Dirk Helbing
http://www.tkuhn.ch
@txkuhn
ETH Zurich
Quid Inc.
11 June 2014
Citation Graph of Scientific Publications
Nodes: publications
Edges: citations (in gray)
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 2 / 21
Citation Graph of Scientific Publications
Nodes: publications
Edges: citations (in gray)
Legend:
Natural/Agricultural Sciences
(except Physical Sciences)
Physical Sciences
Engineering and Technology
Medical and Health Sciences
Social Sciences / Humanities
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 3 / 21
Citation Graph of Scientific Publications
Nodes: publications
Edges: citations (in gray)
Legend:
Natural/Agricultural Sciences
(except Physical Sciences)
Physical Sciences
Engineering and Technology
Medical and Health Sciences
Social Sciences / Humanities
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 4 / 21
Citation Graph of Scientific Publications
Entire giant component (33
million nodes) of the citation
graph of Thomson Reuter’s
Web of Science dataset.
Legend:
Natural/Agricultural Sciences
(except Physical Sciences)
Physical Sciences
Engineering and Technology
Medical and Health Sciences
Social Sciences / Humanities
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 5 / 21
Citation Graph: American Physical Society
Citation graph of the Phys-
ical Review journals (463k
nodes).
Legend:
A: Atomic, molecular,
optical phys.
B: Condensed matter,
materials phys.
C: Nuclear phys.
D: Particles, fields, gravitation,
cosmology
E: Statistical, nonlinear,
soft matter phys.
other journals
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 6 / 21
Citation Graph: Memes
Specific phrases or “memes”
localize to specific regions in
the citation graph.
Legend:
quantum
fission
graphene
self-organized criticality
traffic flow
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 7 / 21
Scientific Memes
“Meme” was coined by Richard Dawkins:
“Just as genes propagate themselves in the gene pool by leaping from body
to body via sperm or eggs, so memes propagate themselves in the meme pool
by leaping from brain to brain via a process which, in the broad sense, can
be called imitation.” [Dawkins, The Selfish Gene]
Examples of memes:
• Melodies
• Recipes
• Cultural habits
• Scientific concepts
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 8 / 21
Genes/Memes as Network Patterns!
Dawkins’ Definition of “Gene”:
“I am using the word gene to mean a genetic unit that is small enough to last
for a number of generations and to be distributed around in many copies.”
[Dawkins, The Selfish Gene]
Our Working Definition of “Scientific Meme”:
A scientific meme is a short unit of text in a publication that is replicated in
citing publications and thereby distributed around in many copies.
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 9 / 21
Propagation Score
Propagation score P quantifies the degree to which a meme’s
occurrence aligns with the citation graph:
Pm =
sticking factor
sparking factor
=
? ?
=
dm→m
d→m
dm→&m
d→&m
To prevent that some infrequent phrases get a high propagation score
by chance, we can add small amount of controlled noise δ (we use
δ = 3):
Pm =
dm→m
d→m + δ
dm→&m + δ
d→&m + δ
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 10 / 21
Frequency/Propagation Score for APS Data
relativefrequency→
10−2
100
102
104
106
10−6
10
−4
10−2
100
APS
n = 1,372,365
quantum
fission
graphene
self-organized
criticality
traffic flow
propagation score →
densityofn-grams:
100
101
102
103
104
105
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 11 / 21
Meme Score
Meme score M as the Product of relative frequency f and
propagation score P:
Mm = fmPm
Top 20 Memes:
1. loop quantum cosmology+
* 11. dark energy+
*
2. unparticle+
* 12. Rashba
3. sonoluminescence+
* 13. CuGeO3
+
4. MgB2
+
14. strange nonchaotic
5. stochastic resonance+
* 15. in NbSe3
6. carbon nanotubes+
* 16. spin Hall+
7. NbSe3
+
17. elliptic flow+
*
8. black hole+
* 18. quantum Hall+
*
9. nanotubes+
19. CeCoIn5
+
10. lattice Boltzmann+
* 20. inflation+
+
annotators agreed that this is an interesting and important physics concept
* also found on the list of terms extracted from Wikipedia
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 13 / 21
Properties of the Meme Score
The meme score has a number of nice properties:
• Can be calculated efficiently and exhaustively even on very large
dataset
• No upper limit on the length of n-grams
• No dependence on external linguistic or ontological knowledge
• No stop-word lists or other kinds of arbitrary filters or thresholds
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 14 / 21
Manual Annotation
• Two annotators (A1, A2): PhD students with physics degree
• Annotation with respect to (1) physics concept or not and (2)
linguistic category
• Randomly extracted phrases for comparison
physics concept not a physics concept
noun phrase verb adjective or adverb other
meme score
A1
A2
A1
A2
random
A1
A2
A1
A2
weighted random
terms
30 60 90 120 150
A1
A2
A1
A2
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 15 / 21
Comparison to Alternative Metrics
0 0.1 0.2 0.3 0.4 0.5
meme score
frequency
max. absolute
change
over time
max. relative
change
over time
max. absolute
difference
across journals
max. relative
difference
across journals
A (area under curve)
10
1
10
2
10
3
0
20
40
60
80
100
top x terms by meme score
percentageofWikipediaterms
40% of top 50
terms are found
on Wikipedia list
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 16 / 21
Evolution over Time: Exemplary Memes
0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 10
5
0
2
4
6
8
10
12
14
publication count
memescore(δ=1)
1940
1960
1970
198019821984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
quantum
fission
graphene
self−organized criticality
traffic flow
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 17 / 21
Evolution over Time
0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 10
5
0
2
4
6
8
10
12
publication count
memescore
1940
1960
1970
198019821984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
graphene
entanglement
MgB2
nanotubes
carbon nanotubes
quark
neutrino
Bose−Einstein
quantum Hall
black
C60
Hubbard model
quantum wells
graphite
reactions
photoemission
black hole
tricritical
Kondo
superconducting
fission
MeV
diffuse scattering
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 18 / 21
Meme Score Calculation
1 Collect all phrases that stick at least once (not counting
“free-riding” on larger memes)
2 Calculate sticking and sparking factors for all collected phrases
Mm = fmPm with Pm =
sticking factor
sparking factor
=
dm→m
d→m + δ
dm→
¡m
+ δ
d→
¡m
+ δ
Example
Citing title:
covariant effective action for loop quantum cosmology from order reduction
Cited titles:
– quantum nature of the big bang
– absence of a singularity in loop quantum cosmology
– large scale effective theory for cosmological bounces
Sticking phrases: loop quantum cosmology, quantum, effective, for
Sparking phrases: covariant, covariant effective action, order reduction, ...
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 19 / 21
Conclusions
Inheritance patterns of memes in the scientific citation graph reveal a
simple mathematical regularity.
This regularity can be formalized by the meme score.
Allows for studying memes in an exhaustive manner.
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 20 / 21
Thank you for your Attention!
Twitter: @txkuhn
Pre-print article:
http://arxiv.org/abs/1404.3757
Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 21 / 21