SlideShare a Scribd company logo
1 of 26
Download to read offline
Meme Extraction from Corpora of Scienti
c 
Literature using Citation Networks 
Tobias Kuhn 
http://www.tkuhn.ch 
@txkuhn 
ETH Zurich 
Colloquium 
Institute of Computational Linguistics 
University of Zurich 
25 November 2014
Reference 
Journal article on the content of this talk: 
Tobias Kuhn, Matjaz Perc, and Dirk Helbing. Inheritance patterns in 
citation networks reveal scienti
c memes. Physical Review X, 4, 
041036, 21 November 2014. https://journals.aps.org/prx/ 
abstract/10.1103/PhysRevX.4.041036 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 2 / 22
Meme Detection 
I am presenting an approach on meme detection", which is related 
to a number of existing problems and approaches: 
 Named-entity extraction 
 Keyphrase extraction 
 Topic modeling 
 Terminology extraction 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 3 / 22
Context for NLP 
Most NLP approaches focus on the analysis of the texts themselves: 
 Grammar 
 Morphology 
 Text Structure 
 Statistical Patterns 
Some also take the contexts of the texts into account: 
 Comparison to properties of entire corpus (e.g. tf{idf) 
 Training on particular corpus/domain/speaker 
 Citation graph of scienti
c publications 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 4 / 22
Citation Graph of Scienti
c Publications 
Nodes: publications 
Edges: citations (in gray) 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 5 / 22
Citation Graph of Scienti
c Publications 
Nodes: publications 
Edges: citations (in gray) 
Legend: 
Natural/Agricultural Sciences 
(except Physical Sciences) 
Physical Sciences 
Engineering and Technology 
Medical and Health Sciences 
Social Sciences / Humanities 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 6 / 22
Citation Graph of Scienti
c Publications 
Nodes: publications 
Edges: citations (in gray) 
Legend: 
Natural/Agricultural Sciences 
(except Physical Sciences) 
Physical Sciences 
Engineering and Technology 
Medical and Health Sciences 
Social Sciences / Humanities 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 7 / 22
Citation Graph of Scienti
c Publications 
Entire giant component (33 
million nodes) of the citation 
graph of Thomson Reuter's 
Web of Science dataset. 
Legend: 
Natural/Agricultural Sciences 
(except Physical Sciences) 
Physical Sciences 
Engineering and Technology 
Medical and Health Sciences 
Social Sciences / Humanities 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 8 / 22
Citation Graph: American Physical Society 
Citation graph of the Phys- 
ical Review journals (463k 
nodes). 
Legend: 
A: Atomic, molecular, 
optical phys. 
B: Condensed matter, 
materials phys. 
C: Nuclear phys. 
D: Particles,
elds, gravitation, 
cosmology 
E: Statistical, nonlinear, 
soft matter phys. 
other journals 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 9 / 22
Citation Graph: Memes 
Speci

More Related Content

Similar to Meme Extraction from Corpora of Scientific Literature using Citation Networks

The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 

Similar to Meme Extraction from Corpora of Scientific Literature using Citation Networks (20)

Codata mist2005
Codata mist2005Codata mist2005
Codata mist2005
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
 
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
 
Visualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham CorpusVisualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham Corpus
 
01 History Of Hypertext+Bibliography 2010
01 History Of Hypertext+Bibliography 201001 History Of Hypertext+Bibliography 2010
01 History Of Hypertext+Bibliography 2010
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
Visual exploration of scientific literature using VOSviewer and CitNetExplorerVisual exploration of scientific literature using VOSviewer and CitNetExplorer
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 

More from Tobias Kuhn

Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Tobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
Tobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
Tobias Kuhn
 

More from Tobias Kuhn (20)

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural Language
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
 
AceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic WikiAceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic Wiki
 
How Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic WikisHow Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic Wikis
 
How to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural LanguagesHow to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural Languages
 
Wissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem EnglischWissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem Englisch
 
An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
 

Recently uploaded

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Recently uploaded (20)

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Meme Extraction from Corpora of Scientific Literature using Citation Networks

  • 1. Meme Extraction from Corpora of Scienti
  • 2. c Literature using Citation Networks Tobias Kuhn http://www.tkuhn.ch @txkuhn ETH Zurich Colloquium Institute of Computational Linguistics University of Zurich 25 November 2014
  • 3. Reference Journal article on the content of this talk: Tobias Kuhn, Matjaz Perc, and Dirk Helbing. Inheritance patterns in citation networks reveal scienti
  • 4. c memes. Physical Review X, 4, 041036, 21 November 2014. https://journals.aps.org/prx/ abstract/10.1103/PhysRevX.4.041036 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 5. c Literature using Citation Networks 2 / 22
  • 6. Meme Detection I am presenting an approach on meme detection", which is related to a number of existing problems and approaches: Named-entity extraction Keyphrase extraction Topic modeling Terminology extraction Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 7. c Literature using Citation Networks 3 / 22
  • 8. Context for NLP Most NLP approaches focus on the analysis of the texts themselves: Grammar Morphology Text Structure Statistical Patterns Some also take the contexts of the texts into account: Comparison to properties of entire corpus (e.g. tf{idf) Training on particular corpus/domain/speaker Citation graph of scienti
  • 9. c publications Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 10. c Literature using Citation Networks 4 / 22
  • 11. Citation Graph of Scienti
  • 12. c Publications Nodes: publications Edges: citations (in gray) Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 13. c Literature using Citation Networks 5 / 22
  • 14. Citation Graph of Scienti
  • 15. c Publications Nodes: publications Edges: citations (in gray) Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 16. c Literature using Citation Networks 6 / 22
  • 17. Citation Graph of Scienti
  • 18. c Publications Nodes: publications Edges: citations (in gray) Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 19. c Literature using Citation Networks 7 / 22
  • 20. Citation Graph of Scienti
  • 21. c Publications Entire giant component (33 million nodes) of the citation graph of Thomson Reuter's Web of Science dataset. Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 22. c Literature using Citation Networks 8 / 22
  • 23. Citation Graph: American Physical Society Citation graph of the Phys- ical Review journals (463k nodes). Legend: A: Atomic, molecular, optical phys. B: Condensed matter, materials phys. C: Nuclear phys. D: Particles,
  • 24. elds, gravitation, cosmology E: Statistical, nonlinear, soft matter phys. other journals Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 25. c Literature using Citation Networks 9 / 22
  • 27. c phrases or memes localize to speci
  • 28. c regions in the citation graph. Legend: quantum
  • 29. ssion graphene self-organized criticality trac ow Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 30. c Literature using Citation Networks 10 / 22
  • 32. c Memes Meme was coined by Richard Dawkins: Just as genes propagate themselves in the gene pool by leaping from body to body via sperm or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain via a process which, in the broad sense, can be called imitation. [Dawkins, The Sel
  • 33. sh Gene] Examples of memes: Melodies Recipes Cultural habits Words, grammar rules, text style Scienti
  • 34. c concepts Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 35. c Literature using Citation Networks 11 / 22
  • 36. Genes/Memes as Network Patterns! Dawkins' De
  • 37. nition of Gene: I am using the word gene to mean a genetic unit that is small enough to last for a number of generations and to be distributed around in many copies. [Dawkins, The Sel
  • 38. sh Gene] Our Working De
  • 40. c Meme: A scienti
  • 41. c meme is a short unit of text in a publication that is replicated in citing publications and thereby distributed around in many copies. Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 42. c Literature using Citation Networks 12 / 22
  • 44. es the degree to which a meme's occurrence aligns with the citation graph: Pm = sticking factor sparking factor = ? , ? = dm!m d!m dm!m d!m To prevent that some infrequent phrases get a high propagation score by chance, we can add small amount of controlled noise (we use = 3): Pm = dm!m d!m + dm!m + d!m + Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 45. c Literature using Citation Networks 13 / 22
  • 46. Frequency/Propagation Score for APS Data 10 relative frequency ! −2 10 0 10 2 10 4 10 6 10 −2 10 −4 10 −6 10 0 APS N = 1,372,365 quantum fission graphene self-organized criticality traffic flow propagation score ! density of n-grams: 105 104 103 102 101 100 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 47. c Literature using Citation Networks 14 / 22
  • 48. Meme Score Meme score M as the Product of relative frequency f and propagation score P: Mm = fmPm Top 20 Memes for APS (Physics): 1. loop quantum cosmology+* 11. dark energy+* 2. unparticle+* 12. Rashba 3. sonoluminescence+* 13. CuGeO3 + 4. MgB2 + 14. strange nonchaotic 5. stochastic resonance+* 15. in NbSe3 6. carbon nanotubes+* 16. spin Hall+ 7. NbSe3 + 17. elliptic ow+* 8. black hole+* 18. quantum Hall+* 9. nanotubes+ 19. CeCoIn5 + 10. lattice Boltzmann+* 20. in ation+ + annotators agreed that this is an interesting and important physics concept * also found on the list of terms extracted from Wikipedia Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 49. c Literature using Citation Networks 15 / 22
  • 50. Properties of the Meme Score The meme score has a number of nice properties: Can be calculated eciently and exhaustively even on very large dataset No upper limit on the length of n-grams No dependence on external linguistic or ontological knowledge No stop-word lists or other kinds of arbitrary
  • 51. lters or thresholds Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 52. c Literature using Citation Networks 16 / 22
  • 53. Manual Annotation Two annotators (A1, A2): PhD students with physics degree Annotation with respect to (1) physics concept or not and (2) linguistic category Randomly extracted phrases for comparison physics concept not a physics concept noun phrase verb adjective or adverb other meme score A1 A2 A1 A2 random A1 A2 A1 A2 weighted random 30 60 90 120 150 terms A1 A2 A1 A2 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 54. c Literature using Citation Networks 17 / 22
  • 55. Comparison to Alternative Metrics 100 80 60 40 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 max. relative difference across journals max. absolute difference across journals max. relative change over time max. absolute change over time frequency meme score A (area under curve) 10 10 10 20 top x terms by meme score percentage of Wikipedia terms 40% of top 50 terms are found on Wikipedia list Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 56. c Literature using Citation Networks 18 / 22
  • 57. Evolution over Time: Exemplary Memes 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 14 12 10 8 6 4 2 0 publication count meme score (d = 1) 1940 1960 1970 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 quantum fission graphene self−organized criticality traffic flow Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 58. c Literature using Citation Networks 19 / 22
  • 59. Evolution over Time 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 105 12 10 8 6 4 2 0 publication count meme score 1940 1960 1970 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008graphene entanglement MgB2 nanotubes carbon nanotubes quark neutrino Bose−Einstein quantum Hall black C60 Hubbard model quantum wells graphite reactions photoemission black hole tricritical Kondo superconducting fission MeV diffuse scattering Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 60. c Literature using Citation Networks 20 / 22
  • 61. Conclusions The citation graph is a very powerful resource to detect memes. Combined with other existing approaches, this seems to be a promising tool for NLP on scienti
  • 62. c publications. Could be applied to other types of texts that have a certain kind of citation structure (legal texts?). Allows for studying memes in an exhaustive manner. Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 63. c Literature using Citation Networks 21 / 22
  • 64. Thank you for your Attention! Questions? Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 65. c Literature using Citation Networks 22 / 22
  • 66. Randomized Network 10 relative frequency ! −2 10 0 10 2 10 4 10 6 10 −2 10 −4 10 −6 10 0 APS randomized (time preserving) N = 89,356 propagation score ! density of n-grams: 105 104 103 102 101 100 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 67. c Literature using Citation Networks 23 / 22
  • 68. Meme Score Calculation 1 Collect all phrases that stick at least once (not counting free-riding on larger memes) 2 Calculate sticking and sparking factors for all collected phrases Mm = fmPm with Pm = sticking factor sparking factor = dm!m d!m + , d m!m + d!m + ! Example Citing title: covariant eective action for loop quantum cosmology from order reduction Cited titles: { quantum nature of the big bang { absence of a singularity in loop quantum cosmology { large scale eective theory for cosmological bounces Sticking phrases: loop quantum cosmology, quantum, eective, for Sparking phrases: covariant, covariant eective action, order reduction, ... Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 69. c Literature using Citation Networks 24 / 22
  • 70. Top Meme Scores for Web of Science Data 1. MgB2 11. loop quantum cosmology 2. lattice Boltzmann 12. zero-divisor 3. graphene 13. BiFeO3 4. on chalcogenolates 14. Neospora 5. Ti3SiC2 15. Papuloerythroderma 6. harmony search 16. Neospora caninum 7. seasonal climate summary 17. metal dusting southern hemisphere 18. porcine circovirus 8. empirical likelihood 19. cone metric 9. proxy re-encryption 20. ranked set 10. spiking neural P systems Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 71. c Literature using Citation Networks 25 / 22
  • 72. Top Meme Scores for PubMed Central Data 1. Buruli ulcer 11. Nipah virus 2. G-quadruplex 12. miRNA 3. miRNAs 13. microRNAs 4. chronic cerebrospinal venous 14. hepatitis E virus insuciency 15. the 45 and Up Study 5. cerebrospinal venous 16. chronic cerebrospinal venous 6. Mycobacterium ulcerans insuciency (CCSVI) 7. enterovirus 71 17. EV71 8. G-quadruplexes 18. bluetongue 9. CCSVI 19. Schmallenberg virus 10. malaria 20. Nipah Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 73. c Literature using Citation Networks 26 / 22