SlideShare a Scribd company logo
Xiaodan Zhu and PeterTurney
National Research Council Canada
Daniel Lemire
TELUQ, Université du Québec Montréal
AndreVellino
School of Information Studies,
University of Ottawa, Ottawa
Measuring Academic Influence:
Not All Citations Are Equal
Overview
—  Some background in CitationAnalysis
—  What we tried to do and why
—  How we did it
—  What the results were
—  What the implications are
What is Citation Analysis
Citation analysis refers to the collection of methods for measuring
the importance of scholars, journals and institutions by counting
citations in a graph of references in the published literature.
…
…
… …
…
…
Why Do Citation Analysis?
—  Reason # 1: Because it generates measurable quantities!
“Since we can’t really measure what
interests us, we begin to be interested
in what we can measure”
JoelWestheimer
Professor of Education
University of Ottawa
Uses for Citation Measures
—  For Readers
—  To evaluate the quality of articles / journals
—  For Universities
—  To evaluate the productivity of academics
—  To help in tenure and promotion decisions
—  For Journals
—  To attract authors to publish
—  For Libraries
—  To make collections / acquisition decisions
—  To make automated recommendations to users
How Are Citations Counted?
—  Add 1 for every new occurrence of a cited article
—  Sum the results
—  Average per article & / or CountTotal # of citations
Problems
—  Self citations!
—  No measure of quality of citing source
—  May be skewed by a small number of highly cited items
—  Easy to “game” by tricking Google Scholar
—  viz. Ike Inktare h-index = 94 – Einstein h-index = 84
h-index
—  Jorge Hirsch (PNAS, 2005) defined the h-index:
—  Attempts to measure both the productivity and impact of the
author’s published work
—  An author has index h if h of their N papers have at least h citations
each, and the other (N − h) papers have at most h citations each.
Some Criticisms of the h-index
—  The h-index does not account for the number of authors or the order of
the authors of a paper.
—  Cannot use the h-index to compare authors in different fields
—  Young researchers with as yet short careers are at a built-in disadvantage
over older researchers
—  Constrained by the total number of publications
—  10 papers each w/ 100 citations each = 10 papers w/ 10 citation each
“[h-index] captures a small amount of information
about the distribution of a scientist's citations [and] loses crucial
information that is essential for the assessment of research.” 
Adler, R., Ewing, J.Taylor, P. Citation statistics.
A report from the International Mathematical Union.
http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
Journal Impact factor (IF)
—  Invented by Eugene Garfield in 1955 to identify journals for
Science Citation Index
—  Definition:
Total Citations (2 preceding years )
Total Articles (2 preceding years )
=JIF
i.e. the impact factor of a journal is the average number
of citations to those papers that were published during
the two preceding years
¨  e.g. the number of times articles published in 2001 and 2002
were cited by indexed journals during 2003 / the total number
of items published in 2001 and 2002
Some Criticisms of Impact Factor
—  Letters or editorials in some journals (e.g. Nature) are often cited
(and counted) in “Total Citations” (numerator) but not in “Total
Articles”
—  2-year window not applicable in many fields (e.g. in Math 90% of
citations fall outside the 2-year window)
—  IF varies considerably across disciplines (Math has an average of
0.9 citation per article, Life Sciences have an average of 6.2)
“Using the impact factor alone to judge a journal is
like using weight alone to judge a person's health.” 
Adler, R., Ewing, J.Taylor, P. Citation statistics.
A report from the International Mathematical Union.
http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
What We Did and Why
—  As early as 1965 Garfield identified 15 different reasons for
citing
—  giving credit for related work
—  correcting a work
—  criticizing previous work
—  Many attempts since to categorize citations
One Big Assumption
All citations should count equally!
Citation Typing Ontology (CiTO)
Here are first 21 of the 91 citation types in CiTO
http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/#refs
Example of semantically annotated article using CiTO:
Our Objective
—  Solve a binary classification problem:
Given a Paper-Reference (P-R) pair, does
P-R belong to the class “R is highly
influential for P” or not.
Our Method
—  Apply Machine Learning methods to train a computer to
recognize “Highly Influential Reference” from examples
Step 1 – Data Collection
We believe that most papers are based on 1, 2, 3 or 4
essential references. By an essential reference, we mean a
reference that was highly influential or inspirational for the
core ideas in your paper; that is, a reference that inspired or
strongly influenced your new algorithm, your experimental
design, or your choice of a research problem. Other
references merely support the work.
We asked for
—  Title of your paper (research papers only; no surveys)
—  The essential references does your paper build?
We got
—  100 papers
—  322 “influential” references
—  i.e. 3.2 “influential references” per article
—  Each paper
—  Contained ~ 31 references in the References section
—  Cited ~ 54 references in the body of the paper
—  i.e. each reverence was cited an average of 1.7 times per paper
The Problem
—  The 100 papers yield 3143 paper-reference pairs
—  The authors have selected ~320 paper-reference pairs
—  Algorithmically: to accurately select those 320 from the 3142
Paper – Reference Analysis
—  OpenNLP used to detect sentence boundaries and tokenize.
—  ParsCit to parse the papers.
—  ParsCit is an open-source package for parsing references and
document structure in scientific papers.
—  Regular expressions to capture citation occurrences in paper
bodies that were not detected by ParsCit.
Characteristics of Corpus
We Looked at 5 Classes of Features
1.  Count-based features
2.  Similarity-based features
3.  Context-based features
4.  Position-based features
5.  Miscellaneous features
Count Based Features
—  Total number of times a paper is referenced in the citing paper
—  The number of different sections in which a given reference appears
—  Number of times a paper is referenced in the
—  “Related” section
—  “Introduction” section
—  “Core” sections (all sections excluding “Related”,“Introduction”,
“Acknowledgements”,“Conclusion” and “FutureWork”
—  The number of different sections in which a reference appears
Content-Similarity Based Features
Citing article Referenced articles
Title-Title
Title-Abstract
Title-Conclusion
Title-Introduction
Title-Core
Citing Context
—  When an article is cited, the linguistic context in which the
article is cited is considered as saying something about the
cited article.
e.g.
“Like Moravcsik and Murugesan (1975),we are concerned
about the side effects of counting insignificant references”
Context-Similarity Based Features
CitingArticle
Title Abstract Introduction Conclusion
Other Context Based Features
—  Authors explicitly mentioned in citation context?
—  Citation alone [4] or with others [3,4,5]
—  If “with others” is it first? (e.g.“[3]” is first in “[3,4,5]”)
Using pre-defined word-lists, is the lexical content of a citation
—  “relevant” [likewise,influential,inspiring useful….]
—  “new” [recently,latest,current,improved…]
—  “extreme” [greatly,intensely,acutely,almighty,awfully]
—  “comparative” [easy,easier,easiest,strong,stronger…]
Lexical Context Features
Using a lexicon of 114,271 words obtained from the General
Inquirer Lexicon (11,788 words) extended w/Wordnet +
Turney and LittmanAlgorithm,
—  Count the number of words labeled
—  “Strong”
—  “Positive”
—  “Evaluative”
Also, sentiment analysis with a different lexicon gave us
—  Presence / absence of “Emotion” (Joy, Sadness,Anger, Fear, etc.)
—  “Positive” / “Negative”
Position Based Features
Where does the citation occur?
—  Citation appears at the beginning of a sentence? (Y/ N)
—  Citation appears at the end of a sentence? (Y/N)
—  Where are the sentence(s) in which the citation(s) occur(s)
e.g.
—  0 (First sentence) to 1 (Last sentence)
—  distance from the mean of occurrences of all citations
Count Based
Features
Similarity Based
Features
Context Based
Features
Position Based
Features
Misc. Features
Top 7 Features: 4 “counts”, 3 “similarity”
Counts in Paper
Counts in Sections
Counts in Core Section
Title-Abstract Similarity
Counts in Intro Section
Title-Core Similarity
Title-Intro Similarity
Conventional Measures on Citation Graph
…
…
… …
…
…
C
R
1
Influence Primed Measures
…
…
… …
…
…
C
RX
where X = (number of times C cites R)2
hip-index
—  Each occurrence of a citation of paper R by paper C = 1
—  hip-index (h-influence-primed) index for an author is the
largest number h such that at least h of the author's papers
have an influence-primed citation count of at least h.
Examples
hip-index = 5
h-index = 2
cited 3 times by C1 = 9
cited 2 times by C2 = 4
cited 2 times by C3 = 4
cited 2 times by C4 = 4
R3 – cited 3 times by C5 = 9
R4 – cited 3 times by C6 = 9
R5 – cited 3 times by C7 = 9
R6 – cited 2 times by C8 = 4
R7 – cited 1 times by C9 = 1
13
8
9
9
9
4
1
hip-index = 3
h-index = 2
cited 2 times by C1 = 4
cited 1 times by C2 = 1
cited 2 times by C3 = 4
cited 1 times by C4 = 1
R3 – cited 2 times by C5 = 4
R4 – cited 1 times by C6 = 1
R5 – cited 1 times by C7 = 1
R6 – cited 1 times by C8 = 1
R7 – cited 1 times by C9 = 1
5
5
4
1
1
1
1
R1
R2
R1
R2
Using hip-index to Predict ACM Fellows
—  Used the citation network constructed from
—  ~ 20,000 papers in theAssociation for Computational Linguistics
Anthology
—  Calculated the h-index ofACL Fellows
—  Calculated the hip-index ofACL Fellows
—  Compared the precision of h-index and hip-index
—  the number ofACL Fellows in the top N divided by N
1/2
2/3
1/4
2/6
3/10
3/9
4/11
4/10
5/11
5/12
Conclusions
—  We can throw away h-index and Impact Factor etc. completely
OR we can try to improve them by counting citations more
relevantly
—  A measure of academic influence for a citation is possible and
—  It is easy to compute to a first approximation – merely count
their frequency
—  Apply the influence-primed weights on citation graphs to
compute
—  Influence-primed Impact Factor, g-index etc.
Thanks!

More Related Content

What's hot

A model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social scienceA model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social science
Salam Shah
 
Review process 2
Review process 2Review process 2
Review process 2
Dr. Shankar Subramaniam
 
Art of writing research article
Art of writing research articleArt of writing research article
Art of writing research article
chelliah paramasivan
 
Review of literature
Review of literatureReview of literature
Review of literature
manasi moharana
 
Apa style guide
Apa style guideApa style guide
Apa style guide3objim
 
How to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journalHow to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journal
Dr. Vivencio (Ven) Ballano
 
Using Citation Analysis to Measure Research Impact
Using Citation Analysis to Measure Research ImpactUsing Citation Analysis to Measure Research Impact
Using Citation Analysis to Measure Research Impact
Editage Insights (Resources for authors and journals)
 
bibliometrics for beginners
bibliometrics for beginnersbibliometrics for beginners
bibliometrics for beginners
Rachel Henderson
 
What really a Research is ?
What really a Research is ?What really a Research is ?
What really a Research is ?
Dr. Shankar Subramaniam
 
Bibliographic coupling
Bibliographic couplingBibliographic coupling
Bibliographic coupling
Ritesh Tiwari
 
Research Metrics
Research MetricsResearch Metrics
Research Metrics
Surendra Kumar Pal
 
Publishing in high impact factor journals
Publishing in high impact factor journalsPublishing in high impact factor journals
Publishing in high impact factor journals
Mohamed Alrshah
 
Citation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchersCitation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchersucclibrarybibliometrics
 
Research project guidelines by nmims
Research project guidelines by nmimsResearch project guidelines by nmims
Research project guidelines by nmimsHarshita Wankhedkar
 
Syllabus final
Syllabus finalSyllabus final
Syllabus final
Dr. Shankar Subramaniam
 
Types of Articles
Types of ArticlesTypes of Articles
Types of Articlesrobinbowles
 
Bibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsBibliometrics: Now There Are Options
Bibliometrics: Now There Are Options
Elaine Lasda
 
Technical writing
Technical writingTechnical writing
Technical writing
MANISH T I
 
God's property
God's propertyGod's property
God's propertySoushilove
 
Conquerors edol500 i-assignment4.1.doc.
Conquerors edol500 i-assignment4.1.doc.Conquerors edol500 i-assignment4.1.doc.
Conquerors edol500 i-assignment4.1.doc.
University of the Southern Caribbean
 

What's hot (20)

A model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social scienceA model for handling overloading of literature review process for social science
A model for handling overloading of literature review process for social science
 
Review process 2
Review process 2Review process 2
Review process 2
 
Art of writing research article
Art of writing research articleArt of writing research article
Art of writing research article
 
Review of literature
Review of literatureReview of literature
Review of literature
 
Apa style guide
Apa style guideApa style guide
Apa style guide
 
How to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journalHow to write and publish an article in a reputable international journal
How to write and publish an article in a reputable international journal
 
Using Citation Analysis to Measure Research Impact
Using Citation Analysis to Measure Research ImpactUsing Citation Analysis to Measure Research Impact
Using Citation Analysis to Measure Research Impact
 
bibliometrics for beginners
bibliometrics for beginnersbibliometrics for beginners
bibliometrics for beginners
 
What really a Research is ?
What really a Research is ?What really a Research is ?
What really a Research is ?
 
Bibliographic coupling
Bibliographic couplingBibliographic coupling
Bibliographic coupling
 
Research Metrics
Research MetricsResearch Metrics
Research Metrics
 
Publishing in high impact factor journals
Publishing in high impact factor journalsPublishing in high impact factor journals
Publishing in high impact factor journals
 
Citation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchersCitation impact: an introduction to bibliometrics for researchers
Citation impact: an introduction to bibliometrics for researchers
 
Research project guidelines by nmims
Research project guidelines by nmimsResearch project guidelines by nmims
Research project guidelines by nmims
 
Syllabus final
Syllabus finalSyllabus final
Syllabus final
 
Types of Articles
Types of ArticlesTypes of Articles
Types of Articles
 
Bibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsBibliometrics: Now There Are Options
Bibliometrics: Now There Are Options
 
Technical writing
Technical writingTechnical writing
Technical writing
 
God's property
God's propertyGod's property
God's property
 
Conquerors edol500 i-assignment4.1.doc.
Conquerors edol500 i-assignment4.1.doc.Conquerors edol500 i-assignment4.1.doc.
Conquerors edol500 i-assignment4.1.doc.
 

Similar to Measuring academic influence: Not all citations are equal

Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...Lior Rokach
 
Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)viv
 
Scholarly impact metrics traditions
Scholarly impact metrics traditionsScholarly impact metrics traditions
Scholarly impact metrics traditionsntunmg
 
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited PublicationsAnalysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Tye Rausch
 
Indexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective researchIndexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective research
Mostafa Nadeer Al-Emran
 
APA Style manual
APA Style manualAPA Style manual
APA Style manual
Bavijesh Thaliyil
 
term paper presentation (1) (1).pptx
term paper presentation (1) (1).pptxterm paper presentation (1) (1).pptx
term paper presentation (1) (1).pptx
icchapipesh
 
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Jamie Bisset
 
Mastering Academic Impact.pptx
Mastering Academic Impact.pptxMastering Academic Impact.pptx
Mastering Academic Impact.pptx
ThimmasettyJ
 
How to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra credHow to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra cred
alfredai53p
 
How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra
alfredai53p
 
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docxINSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
dirkrplav
 
Review of related literature
Review of related literatureReview of related literature
Review of related literatureBean Malicse
 
Guide for authors
Guide for authorsGuide for authors
Guide for authors
Felix E. Arcilla Jr.
 
Journal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything ElseJournal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything Else
Wiley-Blackwell Compass
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
PreethiT4
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
Abdullah Chaudhry
 
impact factor ,h index (1).pptx
impact factor ,h index (1).pptximpact factor ,h index (1).pptx
impact factor ,h index (1).pptx
MariyambibiMandarawa1
 
How to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation toolsHow to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation tools
Mohanapriya Suresh
 
Bibliograpgy5
Bibliograpgy5Bibliograpgy5
Bibliograpgy5
Rajani17
 

Similar to Measuring academic influence: Not all citations are equal (20)

Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
 
Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)Sci期刊影响因子分析(北京)
Sci期刊影响因子分析(北京)
 
Scholarly impact metrics traditions
Scholarly impact metrics traditionsScholarly impact metrics traditions
Scholarly impact metrics traditions
 
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited PublicationsAnalysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
Analysing Author Name Mentions In Citation Contexts Of Highly Cited Publications
 
Indexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective researchIndexing and Citations Metrics: your guide for prospective research
Indexing and Citations Metrics: your guide for prospective research
 
APA Style manual
APA Style manualAPA Style manual
APA Style manual
 
term paper presentation (1) (1).pptx
term paper presentation (1) (1).pptxterm paper presentation (1) (1).pptx
term paper presentation (1) (1).pptx
 
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
Durham Leading Research Module 13 (Bibliometrics and Altmetrics)
 
Mastering Academic Impact.pptx
Mastering Academic Impact.pptxMastering Academic Impact.pptx
Mastering Academic Impact.pptx
 
How to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra credHow to write a scientific paperGuidelines for the extra cred
How to write a scientific paperGuidelines for the extra cred
 
How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra How to write a scientific paperGuidelines for the extra
How to write a scientific paperGuidelines for the extra
 
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docxINSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY  .docx
INSTRUCTIONS FOR THE PREPARATION OF A TECHNICAL ESSAY .docx
 
Review of related literature
Review of related literatureReview of related literature
Review of related literature
 
Guide for authors
Guide for authorsGuide for authors
Guide for authors
 
Journal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything ElseJournal Metrics: The Impact Factor and Everything Else
Journal Metrics: The Impact Factor and Everything Else
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
impact factor ,h index (1).pptx
impact factor ,h index (1).pptximpact factor ,h index (1).pptx
impact factor ,h index (1).pptx
 
How to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation toolsHow to prepare a research paper and its evaluation tools
How to prepare a research paper and its evaluation tools
 
Bibliograpgy5
Bibliograpgy5Bibliograpgy5
Bibliograpgy5
 

More from Andre Vellino

Why machines can't think (logically)
Why machines can't think (logically)Why machines can't think (logically)
Why machines can't think (logically)Andre Vellino
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocistiAndre Vellino
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Andre Vellino
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical LibrarianAndre Vellino
 
La recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueLa recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numérique
Andre Vellino
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender SystemAndre Vellino
 

More from Andre Vellino (6)

Why machines can't think (logically)
Why machines can't think (logically)Why machines can't think (logically)
Why machines can't think (logically)
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocisti
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital Library
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical Librarian
 
La recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueLa recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numérique
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender System
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

Measuring academic influence: Not all citations are equal

  • 1. Xiaodan Zhu and PeterTurney National Research Council Canada Daniel Lemire TELUQ, Université du Québec Montréal AndreVellino School of Information Studies, University of Ottawa, Ottawa Measuring Academic Influence: Not All Citations Are Equal
  • 2. Overview —  Some background in CitationAnalysis —  What we tried to do and why —  How we did it —  What the results were —  What the implications are
  • 3. What is Citation Analysis Citation analysis refers to the collection of methods for measuring the importance of scholars, journals and institutions by counting citations in a graph of references in the published literature. … … … … … …
  • 4. Why Do Citation Analysis? —  Reason # 1: Because it generates measurable quantities! “Since we can’t really measure what interests us, we begin to be interested in what we can measure” JoelWestheimer Professor of Education University of Ottawa
  • 5. Uses for Citation Measures —  For Readers —  To evaluate the quality of articles / journals —  For Universities —  To evaluate the productivity of academics —  To help in tenure and promotion decisions —  For Journals —  To attract authors to publish —  For Libraries —  To make collections / acquisition decisions —  To make automated recommendations to users
  • 6. How Are Citations Counted? —  Add 1 for every new occurrence of a cited article —  Sum the results —  Average per article & / or CountTotal # of citations Problems —  Self citations! —  No measure of quality of citing source —  May be skewed by a small number of highly cited items —  Easy to “game” by tricking Google Scholar —  viz. Ike Inktare h-index = 94 – Einstein h-index = 84
  • 7. h-index —  Jorge Hirsch (PNAS, 2005) defined the h-index: —  Attempts to measure both the productivity and impact of the author’s published work —  An author has index h if h of their N papers have at least h citations each, and the other (N − h) papers have at most h citations each.
  • 8. Some Criticisms of the h-index —  The h-index does not account for the number of authors or the order of the authors of a paper. —  Cannot use the h-index to compare authors in different fields —  Young researchers with as yet short careers are at a built-in disadvantage over older researchers —  Constrained by the total number of publications —  10 papers each w/ 100 citations each = 10 papers w/ 10 citation each “[h-index] captures a small amount of information about the distribution of a scientist's citations [and] loses crucial information that is essential for the assessment of research.”  Adler, R., Ewing, J.Taylor, P. Citation statistics. A report from the International Mathematical Union. http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
  • 9. Journal Impact factor (IF) —  Invented by Eugene Garfield in 1955 to identify journals for Science Citation Index —  Definition: Total Citations (2 preceding years ) Total Articles (2 preceding years ) =JIF i.e. the impact factor of a journal is the average number of citations to those papers that were published during the two preceding years ¨  e.g. the number of times articles published in 2001 and 2002 were cited by indexed journals during 2003 / the total number of items published in 2001 and 2002
  • 10. Some Criticisms of Impact Factor —  Letters or editorials in some journals (e.g. Nature) are often cited (and counted) in “Total Citations” (numerator) but not in “Total Articles” —  2-year window not applicable in many fields (e.g. in Math 90% of citations fall outside the 2-year window) —  IF varies considerably across disciplines (Math has an average of 0.9 citation per article, Life Sciences have an average of 6.2) “Using the impact factor alone to judge a journal is like using weight alone to judge a person's health.”  Adler, R., Ewing, J.Taylor, P. Citation statistics. A report from the International Mathematical Union. http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
  • 11. What We Did and Why
  • 12. —  As early as 1965 Garfield identified 15 different reasons for citing —  giving credit for related work —  correcting a work —  criticizing previous work —  Many attempts since to categorize citations One Big Assumption All citations should count equally!
  • 13. Citation Typing Ontology (CiTO) Here are first 21 of the 91 citation types in CiTO http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/#refs Example of semantically annotated article using CiTO:
  • 14. Our Objective —  Solve a binary classification problem: Given a Paper-Reference (P-R) pair, does P-R belong to the class “R is highly influential for P” or not. Our Method —  Apply Machine Learning methods to train a computer to recognize “Highly Influential Reference” from examples
  • 15. Step 1 – Data Collection We believe that most papers are based on 1, 2, 3 or 4 essential references. By an essential reference, we mean a reference that was highly influential or inspirational for the core ideas in your paper; that is, a reference that inspired or strongly influenced your new algorithm, your experimental design, or your choice of a research problem. Other references merely support the work.
  • 16. We asked for —  Title of your paper (research papers only; no surveys) —  The essential references does your paper build? We got —  100 papers —  322 “influential” references —  i.e. 3.2 “influential references” per article —  Each paper —  Contained ~ 31 references in the References section —  Cited ~ 54 references in the body of the paper —  i.e. each reverence was cited an average of 1.7 times per paper
  • 17. The Problem —  The 100 papers yield 3143 paper-reference pairs —  The authors have selected ~320 paper-reference pairs —  Algorithmically: to accurately select those 320 from the 3142
  • 18. Paper – Reference Analysis —  OpenNLP used to detect sentence boundaries and tokenize. —  ParsCit to parse the papers. —  ParsCit is an open-source package for parsing references and document structure in scientific papers. —  Regular expressions to capture citation occurrences in paper bodies that were not detected by ParsCit.
  • 20. We Looked at 5 Classes of Features 1.  Count-based features 2.  Similarity-based features 3.  Context-based features 4.  Position-based features 5.  Miscellaneous features
  • 21. Count Based Features —  Total number of times a paper is referenced in the citing paper —  The number of different sections in which a given reference appears —  Number of times a paper is referenced in the —  “Related” section —  “Introduction” section —  “Core” sections (all sections excluding “Related”,“Introduction”, “Acknowledgements”,“Conclusion” and “FutureWork” —  The number of different sections in which a reference appears
  • 22. Content-Similarity Based Features Citing article Referenced articles Title-Title Title-Abstract Title-Conclusion Title-Introduction Title-Core
  • 23. Citing Context —  When an article is cited, the linguistic context in which the article is cited is considered as saying something about the cited article. e.g. “Like Moravcsik and Murugesan (1975),we are concerned about the side effects of counting insignificant references”
  • 24. Context-Similarity Based Features CitingArticle Title Abstract Introduction Conclusion
  • 25. Other Context Based Features —  Authors explicitly mentioned in citation context? —  Citation alone [4] or with others [3,4,5] —  If “with others” is it first? (e.g.“[3]” is first in “[3,4,5]”) Using pre-defined word-lists, is the lexical content of a citation —  “relevant” [likewise,influential,inspiring useful….] —  “new” [recently,latest,current,improved…] —  “extreme” [greatly,intensely,acutely,almighty,awfully] —  “comparative” [easy,easier,easiest,strong,stronger…]
  • 26. Lexical Context Features Using a lexicon of 114,271 words obtained from the General Inquirer Lexicon (11,788 words) extended w/Wordnet + Turney and LittmanAlgorithm, —  Count the number of words labeled —  “Strong” —  “Positive” —  “Evaluative” Also, sentiment analysis with a different lexicon gave us —  Presence / absence of “Emotion” (Joy, Sadness,Anger, Fear, etc.) —  “Positive” / “Negative”
  • 27. Position Based Features Where does the citation occur? —  Citation appears at the beginning of a sentence? (Y/ N) —  Citation appears at the end of a sentence? (Y/N) —  Where are the sentence(s) in which the citation(s) occur(s) e.g. —  0 (First sentence) to 1 (Last sentence) —  distance from the mean of occurrences of all citations
  • 28. Count Based Features Similarity Based Features Context Based Features Position Based Features Misc. Features
  • 29. Top 7 Features: 4 “counts”, 3 “similarity” Counts in Paper Counts in Sections Counts in Core Section Title-Abstract Similarity Counts in Intro Section Title-Core Similarity Title-Intro Similarity
  • 30. Conventional Measures on Citation Graph … … … … … … C R 1
  • 31. Influence Primed Measures … … … … … … C RX where X = (number of times C cites R)2
  • 32. hip-index —  Each occurrence of a citation of paper R by paper C = 1 —  hip-index (h-influence-primed) index for an author is the largest number h such that at least h of the author's papers have an influence-primed citation count of at least h.
  • 33. Examples hip-index = 5 h-index = 2 cited 3 times by C1 = 9 cited 2 times by C2 = 4 cited 2 times by C3 = 4 cited 2 times by C4 = 4 R3 – cited 3 times by C5 = 9 R4 – cited 3 times by C6 = 9 R5 – cited 3 times by C7 = 9 R6 – cited 2 times by C8 = 4 R7 – cited 1 times by C9 = 1 13 8 9 9 9 4 1 hip-index = 3 h-index = 2 cited 2 times by C1 = 4 cited 1 times by C2 = 1 cited 2 times by C3 = 4 cited 1 times by C4 = 1 R3 – cited 2 times by C5 = 4 R4 – cited 1 times by C6 = 1 R5 – cited 1 times by C7 = 1 R6 – cited 1 times by C8 = 1 R7 – cited 1 times by C9 = 1 5 5 4 1 1 1 1 R1 R2 R1 R2
  • 34. Using hip-index to Predict ACM Fellows —  Used the citation network constructed from —  ~ 20,000 papers in theAssociation for Computational Linguistics Anthology —  Calculated the h-index ofACL Fellows —  Calculated the hip-index ofACL Fellows —  Compared the precision of h-index and hip-index —  the number ofACL Fellows in the top N divided by N
  • 36. Conclusions —  We can throw away h-index and Impact Factor etc. completely OR we can try to improve them by counting citations more relevantly —  A measure of academic influence for a citation is possible and —  It is easy to compute to a first approximation – merely count their frequency —  Apply the influence-primed weights on citation graphs to compute —  Influence-primed Impact Factor, g-index etc.