SlideShare a Scribd company logo
Meme Extraction from Corpora of Scienti
c 
Literature using Citation Networks 
Tobias Kuhn 
http://www.tkuhn.ch 
@txkuhn 
ETH Zurich 
Colloquium 
Institute of Computational Linguistics 
University of Zurich 
25 November 2014
Reference 
Journal article on the content of this talk: 
Tobias Kuhn, Matjaz Perc, and Dirk Helbing. Inheritance patterns in 
citation networks reveal scienti
c memes. Physical Review X, 4, 
041036, 21 November 2014. https://journals.aps.org/prx/ 
abstract/10.1103/PhysRevX.4.041036 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 2 / 22
Meme Detection 
I am presenting an approach on meme detection", which is related 
to a number of existing problems and approaches: 
 Named-entity extraction 
 Keyphrase extraction 
 Topic modeling 
 Terminology extraction 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 3 / 22
Context for NLP 
Most NLP approaches focus on the analysis of the texts themselves: 
 Grammar 
 Morphology 
 Text Structure 
 Statistical Patterns 
Some also take the contexts of the texts into account: 
 Comparison to properties of entire corpus (e.g. tf{idf) 
 Training on particular corpus/domain/speaker 
 Citation graph of scienti
c publications 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 4 / 22
Citation Graph of Scienti
c Publications 
Nodes: publications 
Edges: citations (in gray) 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 5 / 22
Citation Graph of Scienti
c Publications 
Nodes: publications 
Edges: citations (in gray) 
Legend: 
Natural/Agricultural Sciences 
(except Physical Sciences) 
Physical Sciences 
Engineering and Technology 
Medical and Health Sciences 
Social Sciences / Humanities 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 6 / 22
Citation Graph of Scienti
c Publications 
Nodes: publications 
Edges: citations (in gray) 
Legend: 
Natural/Agricultural Sciences 
(except Physical Sciences) 
Physical Sciences 
Engineering and Technology 
Medical and Health Sciences 
Social Sciences / Humanities 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 7 / 22
Citation Graph of Scienti
c Publications 
Entire giant component (33 
million nodes) of the citation 
graph of Thomson Reuter's 
Web of Science dataset. 
Legend: 
Natural/Agricultural Sciences 
(except Physical Sciences) 
Physical Sciences 
Engineering and Technology 
Medical and Health Sciences 
Social Sciences / Humanities 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 8 / 22
Citation Graph: American Physical Society 
Citation graph of the Phys- 
ical Review journals (463k 
nodes). 
Legend: 
A: Atomic, molecular, 
optical phys. 
B: Condensed matter, 
materials phys. 
C: Nuclear phys. 
D: Particles,
elds, gravitation, 
cosmology 
E: Statistical, nonlinear, 
soft matter phys. 
other journals 
Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
c Literature using Citation Networks 9 / 22
Citation Graph: Memes 
Speci

More Related Content

Similar to Meme Extraction from Corpora of Scientific Literature using Citation Networks

Codata mist2005
Codata mist2005Codata mist2005
Codata mist2005
Heiner Benking
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
Tobias Kuhn
 
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Michel Wermelinger
 
Visualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham CorpusVisualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham Corpus
UCLDH
 
01 History Of Hypertext+Bibliography 2010
01 History Of Hypertext+Bibliography 201001 History Of Hypertext+Bibliography 2010
01 History Of Hypertext+Bibliography 2010
Paul Kahn
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
Ross Mounce
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
OpenEdition
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
Kan Yuenyong
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
Visual exploration of scientific literature using VOSviewer and CitNetExplorerVisual exploration of scientific literature using VOSviewer and CitNetExplorer
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
Nees Jan van Eck
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
Enrico Daga
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
The Statistical and Applied Mathematical Sciences Institute
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
Tobias Kuhn
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
Roy Clariana
 
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
petermurrayrust
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
University of Illinois at Urbana-Champaign
 

Similar to Meme Extraction from Corpora of Scientific Literature using Citation Networks (20)

Codata mist2005
Codata mist2005Codata mist2005
Codata mist2005
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
 
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
Using Formal Concept Analysis to Construct and Visualise Hierarchies of Socio...
 
Visualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham CorpusVisualizing the Transcribe Bentham Corpus
Visualizing the Transcribe Bentham Corpus
 
01 History Of Hypertext+Bibliography 2010
01 History Of Hypertext+Bibliography 201001 History Of Hypertext+Bibliography 2010
01 History Of Hypertext+Bibliography 2010
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
Visual exploration of scientific literature using VOSviewer and CitNetExplorerVisual exploration of scientific literature using VOSviewer and CitNetExplorer
Visual exploration of scientific literature using VOSviewer and CitNetExplorer
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
Parthenos Webinar e-Humanties and e-Heritage Research Infrastructures: Beyond...
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 

More from Tobias Kuhn

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
Tobias Kuhn
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
Tobias Kuhn
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
Tobias Kuhn
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Tobias Kuhn
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
Tobias Kuhn
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
Tobias Kuhn
 
Nanopubs
NanopubsNanopubs
Nanopubs
Tobias Kuhn
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
Tobias Kuhn
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Tobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
Tobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
Tobias Kuhn
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Tobias Kuhn
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural Language
Tobias Kuhn
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
Tobias Kuhn
 
AceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic WikiAceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic Wiki
Tobias Kuhn
 
How Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic WikisHow Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic Wikis
Tobias Kuhn
 
How to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural LanguagesHow to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural Languages
Tobias Kuhn
 
Wissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem EnglischWissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem Englisch
Tobias Kuhn
 
An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
Tobias Kuhn
 

More from Tobias Kuhn (20)

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural Language
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
 
AceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic WikiAceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic Wiki
 
How Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic WikisHow Controlled English can Improve Semantic Wikis
How Controlled English can Improve Semantic Wikis
 
How to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural LanguagesHow to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural Languages
 
Wissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem EnglischWissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem Englisch
 
An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
 

Recently uploaded

Rice Genome Project a complete saga .(1).pptx
Rice Genome  Project a complete saga .(1).pptxRice Genome  Project a complete saga .(1).pptx
Rice Genome Project a complete saga .(1).pptx
SoumyaDixit11
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Faculty of Applied Chemistry and Materials Science
 
Structure of Sperm / Spermatozoon .pdf
Structure of  Sperm / Spermatozoon  .pdfStructure of  Sperm / Spermatozoon  .pdf
Structure of Sperm / Spermatozoon .pdf
SELF-EXPLANATORY
 
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
Sérgio Sacani
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
Types of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptxTypes of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptx
Isha Pandey
 
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdfVIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
poorvarajgolkar
 
Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)
vanshgarg8002
 
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdfGametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
SELF-EXPLANATORY
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physicsTHE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
Dr. sreeremya S
 
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptxellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
muralinath2
 
Post RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) MetabolismPost RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) Metabolism
Areesha Ahmad
 
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Sérgio Sacani
 
Classification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana MadjarClassification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana Madjar
Faculty of Applied Chemistry and Materials Science
 
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
Sérgio Sacani
 
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
Sharon Liu
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
kalpnayadav03021986
 

Recently uploaded (20)

Rice Genome Project a complete saga .(1).pptx
Rice Genome  Project a complete saga .(1).pptxRice Genome  Project a complete saga .(1).pptx
Rice Genome Project a complete saga .(1).pptx
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
 
Structure of Sperm / Spermatozoon .pdf
Structure of  Sperm / Spermatozoon  .pdfStructure of  Sperm / Spermatozoon  .pdf
Structure of Sperm / Spermatozoon .pdf
 
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
 
Types of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptxTypes of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptx
 
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdfVIII-Geography FOR CBSE CLASS 8 INDIA.pdf
VIII-Geography FOR CBSE CLASS 8 INDIA.pdf
 
Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)
 
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdfGametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physicsTHE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
 
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptxellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
 
Post RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) MetabolismPost RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) Metabolism
 
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
 
Classification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana MadjarClassification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana Madjar
 
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
A NICER VIEW OF THE NEAREST AND BRIGHTEST MILLISECOND PULSAR: PSR J0437−4715
 
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
 

Meme Extraction from Corpora of Scientific Literature using Citation Networks

  • 1. Meme Extraction from Corpora of Scienti
  • 2. c Literature using Citation Networks Tobias Kuhn http://www.tkuhn.ch @txkuhn ETH Zurich Colloquium Institute of Computational Linguistics University of Zurich 25 November 2014
  • 3. Reference Journal article on the content of this talk: Tobias Kuhn, Matjaz Perc, and Dirk Helbing. Inheritance patterns in citation networks reveal scienti
  • 4. c memes. Physical Review X, 4, 041036, 21 November 2014. https://journals.aps.org/prx/ abstract/10.1103/PhysRevX.4.041036 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 5. c Literature using Citation Networks 2 / 22
  • 6. Meme Detection I am presenting an approach on meme detection", which is related to a number of existing problems and approaches: Named-entity extraction Keyphrase extraction Topic modeling Terminology extraction Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 7. c Literature using Citation Networks 3 / 22
  • 8. Context for NLP Most NLP approaches focus on the analysis of the texts themselves: Grammar Morphology Text Structure Statistical Patterns Some also take the contexts of the texts into account: Comparison to properties of entire corpus (e.g. tf{idf) Training on particular corpus/domain/speaker Citation graph of scienti
  • 9. c publications Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 10. c Literature using Citation Networks 4 / 22
  • 11. Citation Graph of Scienti
  • 12. c Publications Nodes: publications Edges: citations (in gray) Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 13. c Literature using Citation Networks 5 / 22
  • 14. Citation Graph of Scienti
  • 15. c Publications Nodes: publications Edges: citations (in gray) Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 16. c Literature using Citation Networks 6 / 22
  • 17. Citation Graph of Scienti
  • 18. c Publications Nodes: publications Edges: citations (in gray) Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 19. c Literature using Citation Networks 7 / 22
  • 20. Citation Graph of Scienti
  • 21. c Publications Entire giant component (33 million nodes) of the citation graph of Thomson Reuter's Web of Science dataset. Legend: Natural/Agricultural Sciences (except Physical Sciences) Physical Sciences Engineering and Technology Medical and Health Sciences Social Sciences / Humanities Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 22. c Literature using Citation Networks 8 / 22
  • 23. Citation Graph: American Physical Society Citation graph of the Phys- ical Review journals (463k nodes). Legend: A: Atomic, molecular, optical phys. B: Condensed matter, materials phys. C: Nuclear phys. D: Particles,
  • 24. elds, gravitation, cosmology E: Statistical, nonlinear, soft matter phys. other journals Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 25. c Literature using Citation Networks 9 / 22
  • 27. c phrases or memes localize to speci
  • 28. c regions in the citation graph. Legend: quantum
  • 29. ssion graphene self-organized criticality trac ow Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 30. c Literature using Citation Networks 10 / 22
  • 32. c Memes Meme was coined by Richard Dawkins: Just as genes propagate themselves in the gene pool by leaping from body to body via sperm or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain via a process which, in the broad sense, can be called imitation. [Dawkins, The Sel
  • 33. sh Gene] Examples of memes: Melodies Recipes Cultural habits Words, grammar rules, text style Scienti
  • 34. c concepts Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 35. c Literature using Citation Networks 11 / 22
  • 36. Genes/Memes as Network Patterns! Dawkins' De
  • 37. nition of Gene: I am using the word gene to mean a genetic unit that is small enough to last for a number of generations and to be distributed around in many copies. [Dawkins, The Sel
  • 38. sh Gene] Our Working De
  • 40. c Meme: A scienti
  • 41. c meme is a short unit of text in a publication that is replicated in citing publications and thereby distributed around in many copies. Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 42. c Literature using Citation Networks 12 / 22
  • 44. es the degree to which a meme's occurrence aligns with the citation graph: Pm = sticking factor sparking factor = ? , ? = dm!m d!m dm!m d!m To prevent that some infrequent phrases get a high propagation score by chance, we can add small amount of controlled noise (we use = 3): Pm = dm!m d!m + dm!m + d!m + Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 45. c Literature using Citation Networks 13 / 22
  • 46. Frequency/Propagation Score for APS Data 10 relative frequency ! −2 10 0 10 2 10 4 10 6 10 −2 10 −4 10 −6 10 0 APS N = 1,372,365 quantum fission graphene self-organized criticality traffic flow propagation score ! density of n-grams: 105 104 103 102 101 100 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 47. c Literature using Citation Networks 14 / 22
  • 48. Meme Score Meme score M as the Product of relative frequency f and propagation score P: Mm = fmPm Top 20 Memes for APS (Physics): 1. loop quantum cosmology+* 11. dark energy+* 2. unparticle+* 12. Rashba 3. sonoluminescence+* 13. CuGeO3 + 4. MgB2 + 14. strange nonchaotic 5. stochastic resonance+* 15. in NbSe3 6. carbon nanotubes+* 16. spin Hall+ 7. NbSe3 + 17. elliptic ow+* 8. black hole+* 18. quantum Hall+* 9. nanotubes+ 19. CeCoIn5 + 10. lattice Boltzmann+* 20. in ation+ + annotators agreed that this is an interesting and important physics concept * also found on the list of terms extracted from Wikipedia Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 49. c Literature using Citation Networks 15 / 22
  • 50. Properties of the Meme Score The meme score has a number of nice properties: Can be calculated eciently and exhaustively even on very large dataset No upper limit on the length of n-grams No dependence on external linguistic or ontological knowledge No stop-word lists or other kinds of arbitrary
  • 51. lters or thresholds Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 52. c Literature using Citation Networks 16 / 22
  • 53. Manual Annotation Two annotators (A1, A2): PhD students with physics degree Annotation with respect to (1) physics concept or not and (2) linguistic category Randomly extracted phrases for comparison physics concept not a physics concept noun phrase verb adjective or adverb other meme score A1 A2 A1 A2 random A1 A2 A1 A2 weighted random 30 60 90 120 150 terms A1 A2 A1 A2 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 54. c Literature using Citation Networks 17 / 22
  • 55. Comparison to Alternative Metrics 100 80 60 40 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 max. relative difference across journals max. absolute difference across journals max. relative change over time max. absolute change over time frequency meme score A (area under curve) 10 10 10 20 top x terms by meme score percentage of Wikipedia terms 40% of top 50 terms are found on Wikipedia list Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 56. c Literature using Citation Networks 18 / 22
  • 57. Evolution over Time: Exemplary Memes 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 14 12 10 8 6 4 2 0 publication count meme score (d = 1) 1940 1960 1970 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 quantum fission graphene self−organized criticality traffic flow Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 58. c Literature using Citation Networks 19 / 22
  • 59. Evolution over Time 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 105 12 10 8 6 4 2 0 publication count meme score 1940 1960 1970 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008graphene entanglement MgB2 nanotubes carbon nanotubes quark neutrino Bose−Einstein quantum Hall black C60 Hubbard model quantum wells graphite reactions photoemission black hole tricritical Kondo superconducting fission MeV diffuse scattering Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 60. c Literature using Citation Networks 20 / 22
  • 61. Conclusions The citation graph is a very powerful resource to detect memes. Combined with other existing approaches, this seems to be a promising tool for NLP on scienti
  • 62. c publications. Could be applied to other types of texts that have a certain kind of citation structure (legal texts?). Allows for studying memes in an exhaustive manner. Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 63. c Literature using Citation Networks 21 / 22
  • 64. Thank you for your Attention! Questions? Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 65. c Literature using Citation Networks 22 / 22
  • 66. Randomized Network 10 relative frequency ! −2 10 0 10 2 10 4 10 6 10 −2 10 −4 10 −6 10 0 APS randomized (time preserving) N = 89,356 propagation score ! density of n-grams: 105 104 103 102 101 100 Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 67. c Literature using Citation Networks 23 / 22
  • 68. Meme Score Calculation 1 Collect all phrases that stick at least once (not counting free-riding on larger memes) 2 Calculate sticking and sparking factors for all collected phrases Mm = fmPm with Pm = sticking factor sparking factor = dm!m d!m + , d m!m + d!m + ! Example Citing title: covariant eective action for loop quantum cosmology from order reduction Cited titles: { quantum nature of the big bang { absence of a singularity in loop quantum cosmology { large scale eective theory for cosmological bounces Sticking phrases: loop quantum cosmology, quantum, eective, for Sparking phrases: covariant, covariant eective action, order reduction, ... Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 69. c Literature using Citation Networks 24 / 22
  • 70. Top Meme Scores for Web of Science Data 1. MgB2 11. loop quantum cosmology 2. lattice Boltzmann 12. zero-divisor 3. graphene 13. BiFeO3 4. on chalcogenolates 14. Neospora 5. Ti3SiC2 15. Papuloerythroderma 6. harmony search 16. Neospora caninum 7. seasonal climate summary 17. metal dusting southern hemisphere 18. porcine circovirus 8. empirical likelihood 19. cone metric 9. proxy re-encryption 20. ranked set 10. spiking neural P systems Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 71. c Literature using Citation Networks 25 / 22
  • 72. Top Meme Scores for PubMed Central Data 1. Buruli ulcer 11. Nipah virus 2. G-quadruplex 12. miRNA 3. miRNAs 13. microRNAs 4. chronic cerebrospinal venous 14. hepatitis E virus insuciency 15. the 45 and Up Study 5. cerebrospinal venous 16. chronic cerebrospinal venous 6. Mycobacterium ulcerans insuciency (CCSVI) 7. enterovirus 71 17. EV71 8. G-quadruplexes 18. bluetongue 9. CCSVI 19. Schmallenberg virus 10. malaria 20. Nipah Tobias Kuhn, ETH Zurich Meme Extraction from Corpora of Scienti
  • 73. c Literature using Citation Networks 26 / 22