SlideShare a Scribd company logo
1 of 16
SCITECH STRATEGIES 
Better Maps ● Better Solutions 
Physics Chemistry Engineering Biology Disease Medicine Computer Earth Brain Health Social Humanities 
Atypical combinations are confounded by 
disciplinary effects 
STI 2014 
Leiden, The Netherlands 
Sept. 3-5, 2014 
Kevin W. Boyack & Richard Klavans 
SciTech Strategies, Inc. 
www.mapofscience.com
Better Maps SCITECH STRATEGIES ● Better Solutions 
2 
BACKGROUND 
 We have long been interested 
in indicators of innovative 
research 
 Uzzi et al. (UMSJ) recently 
published an article 
correlating high impact papers 
(innovation) with “atypical 
combinations” (novelty) of 
reference journals 
 Intriguing results; we decided 
to investigate further – to 
replicate the study and then 
further explore this idea of 
novelty 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
3 
UZZI STUDY 
 Hypothesis: “The highest-impact science is primarily grounded in 
exceptionally conventional combinations of prior work yet 
simultaneously features an intrusion of unusual combinations” 
 Data: Used 17.9M articles (1950-2000) from WOS, containing 302M 
references to 15,613 cited journals 
 Method: 
» Journals are used as proxy for “areas of knowledge” 
» Determine which co-cited journal combinations are “conventional” and which are 
“unusual” or “novel” 
» Develop indicators of “convention” and “novelty” from co-citation statistics 
» Calculate “convention” and “novelty” for each paper using indicators 
» Test indicators to see how they correlate with highly cited papers 
 Finding: Papers with high convention AND high novelty are twice as 
likely to be highly cited as the average paper 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
4 
UMSJ METHOD (1) 
 To determine which co-cited journal combinations are “conventional” 
and which are “novel”, UMSJ calculated Z-scores for each co-cited 
journal pair, where Z is defined: 
Z = (Nact – Nexp) / Nvar 
 Nact is the actual number of journal co-citation counts 
 Nexp is an expected number of journal co-citation counts 
 Nvar is the variance of Nexp 
 Nexp and Nvar were estimated by calculating (10) randomized citation 
networks where all citation links were switched using a Monte Carlo 
technique, keeping citing/cited distributions constant at the paper level 
 A negative Z-score indicates that a journal pair is co-cited less often 
than expected; thus is an “atypical combination” of journals 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
5 
UMSJ METHOD (2) 
 Using the computed Z-scores 
for each co-cited journal pair, 
the set of Z-scores can then 
be located for each paper 
 Two summary statistics were 
calculated for each paper 
from its Z-score distribution: 
» Median Z-score – to characterize 
central tendency or “convention” 
» 10th percentile (left tail) Z-score – 
to characterize “novelty” 
 Distributions of these 
summary statistics were 
analyzed 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
6 
UMSJ METHOD (3) 
 Distributions of these paper-level 
summary statistics were 
analyzed 
 Indicators based on these 
summary statistics were 
created 
» Novelty 
 HIGH – 10th Pctl Z-score < 0 
 LOW – 10th Pctl Z-score > 0 
» Conventionality 
 HIGH – median Z-score > Avg 
 LOW – median Z-score < Avg 
 Each paper classified in terms 
of convention and novelty 
Low 
Convention 
High 
Convention 
High 
Novelty 
Low 
Novelty 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
7 
UMSJ RESULTS 
 “Hit” papers defined as the 
top-5% highly cited papers 
 Using indicators: 
» Probability of a (N+C+) 
HIGH NOVELTY, 
HIGH CONVENTION 
paper being a hit paper is 0.0911 
» Probability of a (N-C-) 
LOW NOVELTY, 
LOW CONVENTION 
paper being a hit paper is 0.0205 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
8 
UMSJ ISSUES 
 “Analyses in the supplementary materials (fig. S6) show that these 
empirical regularities for the WOS taken as a whole are largely 
replicated on a field-by-field basis and across time” 
» Across time – YES 
» Across fields or disciplines – NOT REALLY! – UMSJ supplemental results show that 
the N+C+ bin has the highest probability (of the 4 bins) of containing a hit paper for 
only 64% of the 243 subject categories 
 The fact that the N+C+ bin is not ranked first in 36% of subject 
categories is troubling, suggesting potentially large field effects, or even 
individual journal effects 
 Top-5% highly cited not sampled by field 
 Journals may not be the right proxy for “areas of knowledge” 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
9 
REPLICATION 
 We used a different, but parallel, methodology to replicate the UMSJ 
distributions and results 
 Scopus data (2001-2010) – 12M articles, 226M references 
 Included conference papers along with articles 
 K50 statistics for co-cited journal pairs rather than Z-scores and Monte 
Carlo simulations 
» K50 has the same conceptual formulation as the Z-score: 
(Nact – Nexp) / Normalization 
» Expected values and normalization are based on row and column sums 
 UMSJ procedures for calculating distributions, etc. were all followed 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
10 
REPLICATION 
 For the left tail, we used the 5th 
percentile rather than the 10th 
percentile to more closely 
match UMSJ distributions 
 Indicator distributions for the 
median and left tail percentile 
values are very similar to the 
UMSJ distributions 
» Differences in the tail percentile 
curves have no effect on 
indicators since the fractions of 
articles at the zero point of all 
curves are the same 
Low 
Convention 
High 
Convention 
High 
Novelty 
Low 
Novelty 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
11 
REPLICATION 
 Probabilities of hit papers 2001-2005 (top-5% highly cited) as of 2011 
UMSJ (1990-2000) This study (2001-2005) 
% sample Prob % sample Prob 
High Novelty, High Convention (N+C+) 6.7% 0.0911 9.5% 0.0959 
High Novelty, Low Convention (N+C-) 26% 0.0533 30.6% 0.0659 
Low Novelty, High Convention (N-C+) 44% 0.0582 40.5% 0.0433 
Low Novelty, Low Convention (N-C-) 23% 0.0205 19.4% 0.0205 
 Our results are similar to the UMSJ results 
» Higher probability for N+C+ (0.0959 to 0.0911) coupled with a higher fraction within 
that bin (9.5% to 6.7%) suggest that our method does even a bit better at locating 
highly cited papers. 
» High novelty is accentuated overall using our method (N+C- is 0.0659 rather than 
0.0533) 
 Replication was successful, and reproduces the major features of the 
UMSJ study 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
12 
FIELD EFFECTS? 
 2x2 matrix probabilities for the 
top-5% sampled by field were 
compared to the 2x2 matrix 
probabilities using the top-5% 
overall 
 The bins are in the same 
order using top-5% by field, 
but the differences between 
bins are smaller 
» N+C+ (0.0834 vs 0.0959) 
» N-C- (0.0335 vs. 0.0205) 
 This suggests that “atypical 
combinations” are influenced 
by field effects 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
13 
FIELD EFFECTS? 
 Top 20 largest journals (by 
numbers of co-citations) are 
plotted in terms of convention 
and novelty 
» These 20 journals account for 
15.9% of all co-citations 
 Reminder note: Journal are 
plotted here based on how 
they are co-cited, not what is 
published in them ! 
% co-citations above overall median 
% co-citations below zero 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
14 
FIELD EFFECTS? 
 Three groups appear 
» PHYSICS (6 journals) – cited as 
conventional, but not novel 
» BIOMED (9 journals) – cited as 
both conventional and novel 
» MULTI (5 journals) – cited as 
novel and not conventional 
 Nature, Science, and PNAS 
account for 9.4% of ALL 
atypical co-citation pairs 
» Multidisciplinary journals are 
obviously not good proxies for 
“areas of knowledge” 
» They contribute the most to the 
notion of “atypical”, suggesting 
that journals are a poor basis for 
this study 
% co-citations above overall median 
% co-citations below zero 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
15 
SUMMARY 
 We have replicated the UMSJ study and primary finding that 
» Papers with high convention AND high novelty are twice as likely to be highly cited 
as the average paper 
 This is a real finding! There seems to be something to the notion of 
“atypical combinations” that is meaningful and could be predictive 
 However … 
 Field and journal effects are not insignificant, and given that these 
studies were based on journal co-citation, journals and fields may be 
driving “atypical combinations” 
 Journals are the wrong proxy for “areas of knowledge”; we need an 
alternative proxy for “areas of knowledge” 
 Other potential measurements of “atypical-ness” or “novelty” that are 
relatively independent of field or journal effects should be proposed and 
tested 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
Better Maps SCITECH STRATEGIES ● Better Solutions 
16 
QUESTIONS 
Thank-you for your attention ! 
Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities

More Related Content

What's hot

Common statistical pitfalls in basic science research
Common statistical pitfalls in basic science researchCommon statistical pitfalls in basic science research
Common statistical pitfalls in basic science researchRamachandra Barik
 
02 young vpi lecture 2014
02 young vpi lecture 201402 young vpi lecture 2014
02 young vpi lecture 2014jemille6
 
Poster: Equivalence of Electronic and Paper Administration of PRO
Poster: Equivalence of Electronic and Paper Administration of PROPoster: Equivalence of Electronic and Paper Administration of PRO
Poster: Equivalence of Electronic and Paper Administration of PROCRF Health
 
Bowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis PresentationBowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis PresentationDaniel Bowen
 
Leroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at SkolkovoLeroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at Skolkovoigorod
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencycheweb1
 
Baker esni handouts slides
Baker esni handouts slidesBaker esni handouts slides
Baker esni handouts slidesBartsMSBlog
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsBurak Kürsad Günhan
 
Causal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellowsCausal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellowsPavlos Msaouel, MD, PhD
 
Baker esni handouts reading papers
Baker esni handouts reading papersBaker esni handouts reading papers
Baker esni handouts reading papersBartsMSBlog
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyoStephen Senn
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Stephen Senn
 
Elashoff approach section in grant applications
Elashoff approach section in grant applicationsElashoff approach section in grant applications
Elashoff approach section in grant applicationsUCLA CTSI
 
Research methodology3
Research methodology3Research methodology3
Research methodology3Tosif Ahmad
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfallsPavlos Msaouel, MD, PhD
 
A century of t tests
A century of t testsA century of t tests
A century of t testsStephen Senn
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 

What's hot (20)

Common statistical pitfalls in basic science research
Common statistical pitfalls in basic science researchCommon statistical pitfalls in basic science research
Common statistical pitfalls in basic science research
 
Analysis and Interpretation
Analysis and InterpretationAnalysis and Interpretation
Analysis and Interpretation
 
02 young vpi lecture 2014
02 young vpi lecture 201402 young vpi lecture 2014
02 young vpi lecture 2014
 
Poster: Equivalence of Electronic and Paper Administration of PRO
Poster: Equivalence of Electronic and Paper Administration of PROPoster: Equivalence of Electronic and Paper Administration of PRO
Poster: Equivalence of Electronic and Paper Administration of PRO
 
Bowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis PresentationBowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
 
Leroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at SkolkovoLeroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at Skolkovo
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
 
Baker esni handouts slides
Baker esni handouts slidesBaker esni handouts slides
Baker esni handouts slides
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximations
 
Causal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellowsCausal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellows
 
Baker esni handouts reading papers
Baker esni handouts reading papersBaker esni handouts reading papers
Baker esni handouts reading papers
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
 
Elashoff approach section in grant applications
Elashoff approach section in grant applicationsElashoff approach section in grant applications
Elashoff approach section in grant applications
 
Research methodology3
Research methodology3Research methodology3
Research methodology3
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 

Viewers also liked

Bilgisayar destekli öğretim
Bilgisayar destekli öğretimBilgisayar destekli öğretim
Bilgisayar destekli öğretimBüşRa Yndk
 
Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014
Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014
Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014Елена Савельева
 
Promoting australia through chinese social media
Promoting australia through chinese social mediaPromoting australia through chinese social media
Promoting australia through chinese social mediaKaryn Lanthois
 
Sitka police department interview questions
Sitka police department interview questionsSitka police department interview questions
Sitka police department interview questionsselinasimpson409
 
Rivière rouge police department interview questions
Rivière rouge police department interview questionsRivière rouge police department interview questions
Rivière rouge police department interview questionsselinasimpson989
 
Be remembered, Be Seen - Brand South Australia in China
Be remembered, Be Seen - Brand South Australia in ChinaBe remembered, Be Seen - Brand South Australia in China
Be remembered, Be Seen - Brand South Australia in ChinaKaryn Lanthois
 
Colorado springs police department interview questions
Colorado springs police department interview questionsColorado springs police department interview questions
Colorado springs police department interview questionsselinasimpson709
 
Northeastern manitoulin and the islands police department interview questions
Northeastern manitoulin and the islands police department interview questionsNortheastern manitoulin and the islands police department interview questions
Northeastern manitoulin and the islands police department interview questionsselinasimpson989
 
Dothan police department interview questions
Dothan police department interview questionsDothan police department interview questions
Dothan police department interview questionsselinasimpson119
 
2.sillon copia
2.sillon copia2.sillon copia
2.sillon copialupitay
 

Viewers also liked (15)

Bilgisayar destekli öğretim
Bilgisayar destekli öğretimBilgisayar destekli öğretim
Bilgisayar destekli öğretim
 
Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014
Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014
Курс "Куратор содержания. Итоги", Савельева Е. А. , 2014
 
Promoting australia through chinese social media
Promoting australia through chinese social mediaPromoting australia through chinese social media
Promoting australia through chinese social media
 
Sitka police department interview questions
Sitka police department interview questionsSitka police department interview questions
Sitka police department interview questions
 
Rivière rouge police department interview questions
Rivière rouge police department interview questionsRivière rouge police department interview questions
Rivière rouge police department interview questions
 
Be remembered, Be Seen - Brand South Australia in China
Be remembered, Be Seen - Brand South Australia in ChinaBe remembered, Be Seen - Brand South Australia in China
Be remembered, Be Seen - Brand South Australia in China
 
Geologic time
Geologic timeGeologic time
Geologic time
 
Pemecahan masalah
Pemecahan masalahPemecahan masalah
Pemecahan masalah
 
Colorado springs police department interview questions
Colorado springs police department interview questionsColorado springs police department interview questions
Colorado springs police department interview questions
 
Sisop
SisopSisop
Sisop
 
Magnesia
MagnesiaMagnesia
Magnesia
 
Northeastern manitoulin and the islands police department interview questions
Northeastern manitoulin and the islands police department interview questionsNortheastern manitoulin and the islands police department interview questions
Northeastern manitoulin and the islands police department interview questions
 
Dothan police department interview questions
Dothan police department interview questionsDothan police department interview questions
Dothan police department interview questions
 
2.sillon copia
2.sillon copia2.sillon copia
2.sillon copia
 
Vocabulary
VocabularyVocabulary
Vocabulary
 

Similar to Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)
Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)
Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)Kevin Boyack
 
Leibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific NotationLeibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific Notationkhinsen
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13Russ Altman
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
 
AETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 NiceAETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 NiceMartin Hofmann-Apitius
 
1) The path length from A to B in the following graph is .docx
1) The path length from A to B in the following graph is .docx1) The path length from A to B in the following graph is .docx
1) The path length from A to B in the following graph is .docxmonicafrancis71118
 
Motivation for biostatistics
Motivation for biostatisticsMotivation for biostatistics
Motivation for biostatisticso_devinyak
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG researchDorothy Bishop
 
The ABC of Evidence-Base Medicine
The ABC of Evidence-Base MedicineThe ABC of Evidence-Base Medicine
The ABC of Evidence-Base MedicineDr Max Mongelli
 
Open Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptxOpen Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptxEwout Steyerberg
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisAntica Culina
 

Similar to Atypical combinations are confounded by disciplinary effects (Boyack & Klavans) (20)

Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)
Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)
Indicators of Innovative Research (Klavans, Boyack, Small, Sorensen, Ioannidis)
 
Leibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific NotationLeibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific Notation
 
Making Your Research Findable: Writing a great abstract
Making Your Research Findable: Writing a great abstractMaking Your Research Findable: Writing a great abstract
Making Your Research Findable: Writing a great abstract
 
PLOS Visualization Project
PLOS Visualization ProjectPLOS Visualization Project
PLOS Visualization Project
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
QUANTEC
QUANTECQUANTEC
QUANTEC
 
AETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 NiceAETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 Nice
 
Everything wrong with statistics (and how to fix it)
Everything wrong with statistics (and how to fix it)Everything wrong with statistics (and how to fix it)
Everything wrong with statistics (and how to fix it)
 
1) The path length from A to B in the following graph is .docx
1) The path length from A to B in the following graph is .docx1) The path length from A to B in the following graph is .docx
1) The path length from A to B in the following graph is .docx
 
Motivation for biostatistics
Motivation for biostatisticsMotivation for biostatistics
Motivation for biostatistics
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
The ABC of Evidence-Base Medicine
The ABC of Evidence-Base MedicineThe ABC of Evidence-Base Medicine
The ABC of Evidence-Base Medicine
 
article.pdf
article.pdfarticle.pdf
article.pdf
 
Research report
Research reportResearch report
Research report
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Open Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptxOpen Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptx
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Brief overview on meta analysis
Brief overview  on meta analysisBrief overview  on meta analysis
Brief overview on meta analysis
 

Recently uploaded

DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 

Recently uploaded (20)

DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 

Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

  • 1. SCITECH STRATEGIES Better Maps ● Better Solutions Physics Chemistry Engineering Biology Disease Medicine Computer Earth Brain Health Social Humanities Atypical combinations are confounded by disciplinary effects STI 2014 Leiden, The Netherlands Sept. 3-5, 2014 Kevin W. Boyack & Richard Klavans SciTech Strategies, Inc. www.mapofscience.com
  • 2. Better Maps SCITECH STRATEGIES ● Better Solutions 2 BACKGROUND  We have long been interested in indicators of innovative research  Uzzi et al. (UMSJ) recently published an article correlating high impact papers (innovation) with “atypical combinations” (novelty) of reference journals  Intriguing results; we decided to investigate further – to replicate the study and then further explore this idea of novelty Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 3. Better Maps SCITECH STRATEGIES ● Better Solutions 3 UZZI STUDY  Hypothesis: “The highest-impact science is primarily grounded in exceptionally conventional combinations of prior work yet simultaneously features an intrusion of unusual combinations”  Data: Used 17.9M articles (1950-2000) from WOS, containing 302M references to 15,613 cited journals  Method: » Journals are used as proxy for “areas of knowledge” » Determine which co-cited journal combinations are “conventional” and which are “unusual” or “novel” » Develop indicators of “convention” and “novelty” from co-citation statistics » Calculate “convention” and “novelty” for each paper using indicators » Test indicators to see how they correlate with highly cited papers  Finding: Papers with high convention AND high novelty are twice as likely to be highly cited as the average paper Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 4. Better Maps SCITECH STRATEGIES ● Better Solutions 4 UMSJ METHOD (1)  To determine which co-cited journal combinations are “conventional” and which are “novel”, UMSJ calculated Z-scores for each co-cited journal pair, where Z is defined: Z = (Nact – Nexp) / Nvar  Nact is the actual number of journal co-citation counts  Nexp is an expected number of journal co-citation counts  Nvar is the variance of Nexp  Nexp and Nvar were estimated by calculating (10) randomized citation networks where all citation links were switched using a Monte Carlo technique, keeping citing/cited distributions constant at the paper level  A negative Z-score indicates that a journal pair is co-cited less often than expected; thus is an “atypical combination” of journals Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 5. Better Maps SCITECH STRATEGIES ● Better Solutions 5 UMSJ METHOD (2)  Using the computed Z-scores for each co-cited journal pair, the set of Z-scores can then be located for each paper  Two summary statistics were calculated for each paper from its Z-score distribution: » Median Z-score – to characterize central tendency or “convention” » 10th percentile (left tail) Z-score – to characterize “novelty”  Distributions of these summary statistics were analyzed Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 6. Better Maps SCITECH STRATEGIES ● Better Solutions 6 UMSJ METHOD (3)  Distributions of these paper-level summary statistics were analyzed  Indicators based on these summary statistics were created » Novelty  HIGH – 10th Pctl Z-score < 0  LOW – 10th Pctl Z-score > 0 » Conventionality  HIGH – median Z-score > Avg  LOW – median Z-score < Avg  Each paper classified in terms of convention and novelty Low Convention High Convention High Novelty Low Novelty Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 7. Better Maps SCITECH STRATEGIES ● Better Solutions 7 UMSJ RESULTS  “Hit” papers defined as the top-5% highly cited papers  Using indicators: » Probability of a (N+C+) HIGH NOVELTY, HIGH CONVENTION paper being a hit paper is 0.0911 » Probability of a (N-C-) LOW NOVELTY, LOW CONVENTION paper being a hit paper is 0.0205 Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 8. Better Maps SCITECH STRATEGIES ● Better Solutions 8 UMSJ ISSUES  “Analyses in the supplementary materials (fig. S6) show that these empirical regularities for the WOS taken as a whole are largely replicated on a field-by-field basis and across time” » Across time – YES » Across fields or disciplines – NOT REALLY! – UMSJ supplemental results show that the N+C+ bin has the highest probability (of the 4 bins) of containing a hit paper for only 64% of the 243 subject categories  The fact that the N+C+ bin is not ranked first in 36% of subject categories is troubling, suggesting potentially large field effects, or even individual journal effects  Top-5% highly cited not sampled by field  Journals may not be the right proxy for “areas of knowledge” Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 9. Better Maps SCITECH STRATEGIES ● Better Solutions 9 REPLICATION  We used a different, but parallel, methodology to replicate the UMSJ distributions and results  Scopus data (2001-2010) – 12M articles, 226M references  Included conference papers along with articles  K50 statistics for co-cited journal pairs rather than Z-scores and Monte Carlo simulations » K50 has the same conceptual formulation as the Z-score: (Nact – Nexp) / Normalization » Expected values and normalization are based on row and column sums  UMSJ procedures for calculating distributions, etc. were all followed Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 10. Better Maps SCITECH STRATEGIES ● Better Solutions 10 REPLICATION  For the left tail, we used the 5th percentile rather than the 10th percentile to more closely match UMSJ distributions  Indicator distributions for the median and left tail percentile values are very similar to the UMSJ distributions » Differences in the tail percentile curves have no effect on indicators since the fractions of articles at the zero point of all curves are the same Low Convention High Convention High Novelty Low Novelty Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 11. Better Maps SCITECH STRATEGIES ● Better Solutions 11 REPLICATION  Probabilities of hit papers 2001-2005 (top-5% highly cited) as of 2011 UMSJ (1990-2000) This study (2001-2005) % sample Prob % sample Prob High Novelty, High Convention (N+C+) 6.7% 0.0911 9.5% 0.0959 High Novelty, Low Convention (N+C-) 26% 0.0533 30.6% 0.0659 Low Novelty, High Convention (N-C+) 44% 0.0582 40.5% 0.0433 Low Novelty, Low Convention (N-C-) 23% 0.0205 19.4% 0.0205  Our results are similar to the UMSJ results » Higher probability for N+C+ (0.0959 to 0.0911) coupled with a higher fraction within that bin (9.5% to 6.7%) suggest that our method does even a bit better at locating highly cited papers. » High novelty is accentuated overall using our method (N+C- is 0.0659 rather than 0.0533)  Replication was successful, and reproduces the major features of the UMSJ study Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 12. Better Maps SCITECH STRATEGIES ● Better Solutions 12 FIELD EFFECTS?  2x2 matrix probabilities for the top-5% sampled by field were compared to the 2x2 matrix probabilities using the top-5% overall  The bins are in the same order using top-5% by field, but the differences between bins are smaller » N+C+ (0.0834 vs 0.0959) » N-C- (0.0335 vs. 0.0205)  This suggests that “atypical combinations” are influenced by field effects Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 13. Better Maps SCITECH STRATEGIES ● Better Solutions 13 FIELD EFFECTS?  Top 20 largest journals (by numbers of co-citations) are plotted in terms of convention and novelty » These 20 journals account for 15.9% of all co-citations  Reminder note: Journal are plotted here based on how they are co-cited, not what is published in them ! % co-citations above overall median % co-citations below zero Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 14. Better Maps SCITECH STRATEGIES ● Better Solutions 14 FIELD EFFECTS?  Three groups appear » PHYSICS (6 journals) – cited as conventional, but not novel » BIOMED (9 journals) – cited as both conventional and novel » MULTI (5 journals) – cited as novel and not conventional  Nature, Science, and PNAS account for 9.4% of ALL atypical co-citation pairs » Multidisciplinary journals are obviously not good proxies for “areas of knowledge” » They contribute the most to the notion of “atypical”, suggesting that journals are a poor basis for this study % co-citations above overall median % co-citations below zero Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 15. Better Maps SCITECH STRATEGIES ● Better Solutions 15 SUMMARY  We have replicated the UMSJ study and primary finding that » Papers with high convention AND high novelty are twice as likely to be highly cited as the average paper  This is a real finding! There seems to be something to the notion of “atypical combinations” that is meaningful and could be predictive  However …  Field and journal effects are not insignificant, and given that these studies were based on journal co-citation, journals and fields may be driving “atypical combinations”  Journals are the wrong proxy for “areas of knowledge”; we need an alternative proxy for “areas of knowledge”  Other potential measurements of “atypical-ness” or “novelty” that are relatively independent of field or journal effects should be proposed and tested Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities
  • 16. Better Maps SCITECH STRATEGIES ● Better Solutions 16 QUESTIONS Thank-you for your attention ! Physics Computer Chemistry Engineering Earth Biology Disease Medicine Brain Health Social Humanities