Advertisement
Advertisement

More Related Content

More from Tobias Kuhn(20)

Advertisement

Citation Graph Analysis to Identify Memes in Scientific Literature

  1. Citation Graph Analysis to Identify Memes in Scientific Literature Tobias Kuhn and Matjaz Perc and Dirk Helbing http://www.tkuhn.ch @txkuhn ETH Zurich Quid Inc. 11 June 2014
  2. Citation Graph of Scientific Publications Nodes: publications Edges: citations (in gray) Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 2 / 21
  3. Citation Graph of Scientific Publications Nodes: publications Edges: citations (in gray) Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 3 / 21
  4. Citation Graph of Scientific Publications Nodes: publications Edges: citations (in gray) Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 4 / 21
  5. Citation Graph of Scientific Publications Entire giant component (33 million nodes) of the citation graph of Thomson Reuter’s Web of Science dataset. Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 5 / 21
  6. Citation Graph: American Physical Society Citation graph of the Phys- ical Review journals (463k nodes). Legend: A: Atomic, molecular, optical phys. B: Condensed matter, materials phys. C: Nuclear phys. D: Particles, fields, gravitation, cosmology E: Statistical, nonlinear, soft matter phys. other journals Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 6 / 21
  7. Citation Graph: Memes Specific phrases or “memes” localize to specific regions in the citation graph. Legend: quantum fission graphene self-organized criticality traffic flow Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 7 / 21
  8. Scientific Memes “Meme” was coined by Richard Dawkins: “Just as genes propagate themselves in the gene pool by leaping from body to body via sperm or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain via a process which, in the broad sense, can be called imitation.” [Dawkins, The Selfish Gene] Examples of memes: • Melodies • Recipes • Cultural habits • Scientific concepts Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 8 / 21
  9. Genes/Memes as Network Patterns! Dawkins’ Definition of “Gene”: “I am using the word gene to mean a genetic unit that is small enough to last for a number of generations and to be distributed around in many copies.” [Dawkins, The Selfish Gene] Our Working Definition of “Scientific Meme”: A scientific meme is a short unit of text in a publication that is replicated in citing publications and thereby distributed around in many copies. Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 9 / 21
  10. Propagation Score Propagation score P quantifies the degree to which a meme’s occurrence aligns with the citation graph: Pm = sticking factor sparking factor = ? ? = dm→m d→m dm→&m d→&m To prevent that some infrequent phrases get a high propagation score by chance, we can add small amount of controlled noise δ (we use δ = 3): Pm = dm→m d→m + δ dm→&m + δ d→&m + δ Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 10 / 21
  11. Frequency/Propagation Score for APS Data relativefrequency→ 10−2 100 102 104 106 10−6 10 −4 10−2 100 APS n = 1,372,365 quantum fission graphene self-organized criticality traffic flow propagation score → densityofn-grams: 100 101 102 103 104 105 Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 11 / 21
  12. Randomized Network relativefrequency→ 10−2 100 102 104 106 10−6 10 −4 10−2 100 APS randomized (time preserving) n = 89,356 propagation score → densityofn-grams: 100 101 102 103 104 105 Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 12 / 21
  13. Meme Score Meme score M as the Product of relative frequency f and propagation score P: Mm = fmPm Top 20 Memes: 1. loop quantum cosmology+ * 11. dark energy+ * 2. unparticle+ * 12. Rashba 3. sonoluminescence+ * 13. CuGeO3 + 4. MgB2 + 14. strange nonchaotic 5. stochastic resonance+ * 15. in NbSe3 6. carbon nanotubes+ * 16. spin Hall+ 7. NbSe3 + 17. elliptic flow+ * 8. black hole+ * 18. quantum Hall+ * 9. nanotubes+ 19. CeCoIn5 + 10. lattice Boltzmann+ * 20. inflation+ + annotators agreed that this is an interesting and important physics concept * also found on the list of terms extracted from Wikipedia Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 13 / 21
  14. Properties of the Meme Score The meme score has a number of nice properties: • Can be calculated efficiently and exhaustively even on very large dataset • No upper limit on the length of n-grams • No dependence on external linguistic or ontological knowledge • No stop-word lists or other kinds of arbitrary filters or thresholds Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 14 / 21
  15. Manual Annotation • Two annotators (A1, A2): PhD students with physics degree • Annotation with respect to (1) physics concept or not and (2) linguistic category • Randomly extracted phrases for comparison physics concept not a physics concept noun phrase verb adjective or adverb other meme score A1 A2 A1 A2 random A1 A2 A1 A2 weighted random terms 30 60 90 120 150 A1 A2 A1 A2 Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 15 / 21
  16. Comparison to Alternative Metrics 0 0.1 0.2 0.3 0.4 0.5 meme score frequency max. absolute change over time max. relative change over time max. absolute difference across journals max. relative difference across journals A (area under curve) 10 1 10 2 10 3 0 20 40 60 80 100 top x terms by meme score percentageofWikipediaterms 40% of top 50 terms are found on Wikipedia list Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 16 / 21
  17. Evolution over Time: Exemplary Memes 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10 5 0 2 4 6 8 10 12 14 publication count memescore(δ=1) 1940 1960 1970 198019821984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 quantum fission graphene self−organized criticality traffic flow Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 17 / 21
  18. Evolution over Time 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10 5 0 2 4 6 8 10 12 publication count memescore 1940 1960 1970 198019821984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 graphene entanglement MgB2 nanotubes carbon nanotubes quark neutrino Bose−Einstein quantum Hall black C60 Hubbard model quantum wells graphite reactions photoemission black hole tricritical Kondo superconducting fission MeV diffuse scattering Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 18 / 21
  19. Meme Score Calculation 1 Collect all phrases that stick at least once (not counting “free-riding” on larger memes) 2 Calculate sticking and sparking factors for all collected phrases Mm = fmPm with Pm = sticking factor sparking factor = dm→m d→m + δ dm→ ¡m + δ d→ ¡m + δ Example Citing title: covariant effective action for loop quantum cosmology from order reduction Cited titles: – quantum nature of the big bang – absence of a singularity in loop quantum cosmology – large scale effective theory for cosmological bounces Sticking phrases: loop quantum cosmology, quantum, effective, for Sparking phrases: covariant, covariant effective action, order reduction, ... Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 19 / 21
  20. Conclusions Inheritance patterns of memes in the scientific citation graph reveal a simple mathematical regularity. This regularity can be formalized by the meme score. Allows for studying memes in an exhaustive manner. Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 20 / 21
  21. Thank you for your Attention! Twitter: @txkuhn Pre-print article: http://arxiv.org/abs/1404.3757 Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 21 / 21
Advertisement