Measuring Researcher Diversity
     and its Impact on Awards
Tanu Malik                  Computation Institute
Andrey Rzhetsky             Department of Human Genetics
Ian Foster              Computation Institute
              University of Chicago
           Argonne National Laboratory
History                          Leonardo da Vinci
Bohr




           Darwin




                    Renaissance polymath, painter, sculptor, architect,
                    musician, mathematician, engineer, inventor, anatomist,
Einstein
                    geologist, cartographer, botanist, and writer
Biological species
            Short Term: Competition                                        Long Term: Changing Environments




                      Competition                                                                   Niche Differences




Adapted from: Levine, J. M. & HilleRisLambers, J. (2012) The Maintenance of Species Diversity. Nature Education Knowledge 3(10):59
Biological species:
                                   Specialist/Generalist
            Short Term: Competition                                        Long Term: Changing Environments




                      Competition                                                                   Niche Differences




Adapted from: Levine, J. M. & HilleRisLambers, J. (2012) The Maintenance of Species Diversity. Nature Education Knowledge 3(10):59
Science Research
Short Term: Competition on Topics   Long Term: Changing Funding Situation




      Competition                                   Niche Differences
Why is this important?
• Research articles whose coauthors are in different departments at
  the same university receive more citations than those authored in a
  single department (Katz et.al, 1997).
• Multi–university collaborations that include a top tier–university
  were found to produce the highest–impact research articles (Jones,
  et al., 2008).
• It has also been demonstrated that scholarly work covering a range
  of fields — and patents generated by larger teams of co–authors —
  tend to have greater impact over time (Wuchty, et al., 2007).
• In the area of nanotechnology authors who have a diverse set of
  collaborators tend to write articles that have higher impact (Rafols
  et. al., 2010).
• Finally, diverse groups can, depending on the type of task,
  outperform individual experts or even groups of experts (Page,
  2007).
Individual Focus
• Some mathematicians are birds, other are frogs.
  Birds fly high in the air, frogs live in the mud
  below.. (Freeman Dyson, AMS Einstein Lecture,
  2008)
• “Foxes”, individuals who know many little things,
  tend to make better predictions about future
  outcomes than “hedgehogs” who focus on one
  big thing (Tetlock, 2005)
• Individuals’ degree of focus is positively
  correlated with the quality of their contributions
  (Adamic, 2010)
Goals and Problems
• Goal: Quantify the ability of each class of
  researchers (specialists/generalists) to
  competition in near term and adapt to
  changing funding requirements in the long
  term.
  A. How to determine specialist and generalist
     researchers?
  B. How to quantify the ability to compete/adapt?
A. Researcher Diversity
• Based on their publication history, determine
  if their interests can be classified into highly
  varied interests or focused interests




• Researcher profiles created from PubMed
Creating Researcher Profiles
• Author Disambiguation
  – Data mining methods
  – Microsoft Academic Search
     • Automated profiles of users
     • Web scraping
     • Person’s organization and domain
     of interest as disambiguating features
  – Harvard Profiles
     • Directly links to PubMed
     • Also takes an input of publications
      claimed by an author.
Researcher’s Interests
• Controlled Vocabulary
• Keywords
• Topic Modeling
Controlled Vocabulary
• Medical Subject Headings (MeSH)
  – poly-hierarchy of 25,186 medical concepts
Researcher Diversity
• Shannon’s Entropy

pi: proportion of individual’s contributions in
category i
  – Category = MeSH term
  – Frequency over years


         0.00   0.41   0.82   1.00   1.59   2.00
Shannon Vs Sterling
• Variety: how many different areas an
  individual contributes
• Balance: how evenly their efforts are
  distributed among these areas; and,
• Similarity, or how related those areas are
B. Quantifying the Ability to Compete




• Entropy has a negative correlation with measures of impact and productivity,
  viz. the h-index and the g-index.
• Result (in a way) reconfirms Adamic’s result of positive correlation between
  specialist and productivity
Geniuses, Birds, Beavers, Frogs




Geniuses: Dwell on many topics at all times (8-9)
Birds: Dwell on many topics over their research career, but a few topics at a given time
Beavers: Specialists whose focus is interdisciplinary
Frogs: High-focused
A Bird
A Frog
Impact of Researcher Diversity on Awards
Future Work
• Larger datasets
• Researchers in the long tail are specialists;
  generalists are in the head of the tail;
Summary
• A framework to understand researcher
  diversity
• Quantification of researcher diversity with
  productivity and awards
• Negative correlation of diversity with
  productivity and positive with awards
• Use more accurate author disambiguation
  methods

Scientometrics

  • 1.
    Measuring Researcher Diversity and its Impact on Awards Tanu Malik Computation Institute Andrey Rzhetsky Department of Human Genetics Ian Foster Computation Institute University of Chicago Argonne National Laboratory
  • 2.
    History Leonardo da Vinci Bohr Darwin Renaissance polymath, painter, sculptor, architect, musician, mathematician, engineer, inventor, anatomist, Einstein geologist, cartographer, botanist, and writer
  • 3.
    Biological species Short Term: Competition Long Term: Changing Environments Competition Niche Differences Adapted from: Levine, J. M. & HilleRisLambers, J. (2012) The Maintenance of Species Diversity. Nature Education Knowledge 3(10):59
  • 4.
    Biological species: Specialist/Generalist Short Term: Competition Long Term: Changing Environments Competition Niche Differences Adapted from: Levine, J. M. & HilleRisLambers, J. (2012) The Maintenance of Species Diversity. Nature Education Knowledge 3(10):59
  • 5.
    Science Research Short Term:Competition on Topics Long Term: Changing Funding Situation Competition Niche Differences
  • 6.
    Why is thisimportant? • Research articles whose coauthors are in different departments at the same university receive more citations than those authored in a single department (Katz et.al, 1997). • Multi–university collaborations that include a top tier–university were found to produce the highest–impact research articles (Jones, et al., 2008). • It has also been demonstrated that scholarly work covering a range of fields — and patents generated by larger teams of co–authors — tend to have greater impact over time (Wuchty, et al., 2007). • In the area of nanotechnology authors who have a diverse set of collaborators tend to write articles that have higher impact (Rafols et. al., 2010). • Finally, diverse groups can, depending on the type of task, outperform individual experts or even groups of experts (Page, 2007).
  • 7.
    Individual Focus • Somemathematicians are birds, other are frogs. Birds fly high in the air, frogs live in the mud below.. (Freeman Dyson, AMS Einstein Lecture, 2008) • “Foxes”, individuals who know many little things, tend to make better predictions about future outcomes than “hedgehogs” who focus on one big thing (Tetlock, 2005) • Individuals’ degree of focus is positively correlated with the quality of their contributions (Adamic, 2010)
  • 8.
    Goals and Problems •Goal: Quantify the ability of each class of researchers (specialists/generalists) to competition in near term and adapt to changing funding requirements in the long term. A. How to determine specialist and generalist researchers? B. How to quantify the ability to compete/adapt?
  • 9.
    A. Researcher Diversity •Based on their publication history, determine if their interests can be classified into highly varied interests or focused interests • Researcher profiles created from PubMed
  • 10.
    Creating Researcher Profiles •Author Disambiguation – Data mining methods – Microsoft Academic Search • Automated profiles of users • Web scraping • Person’s organization and domain of interest as disambiguating features – Harvard Profiles • Directly links to PubMed • Also takes an input of publications claimed by an author.
  • 11.
    Researcher’s Interests • ControlledVocabulary • Keywords • Topic Modeling
  • 12.
    Controlled Vocabulary • MedicalSubject Headings (MeSH) – poly-hierarchy of 25,186 medical concepts
  • 13.
    Researcher Diversity • Shannon’sEntropy pi: proportion of individual’s contributions in category i – Category = MeSH term – Frequency over years 0.00 0.41 0.82 1.00 1.59 2.00
  • 14.
    Shannon Vs Sterling •Variety: how many different areas an individual contributes • Balance: how evenly their efforts are distributed among these areas; and, • Similarity, or how related those areas are
  • 15.
    B. Quantifying theAbility to Compete • Entropy has a negative correlation with measures of impact and productivity, viz. the h-index and the g-index. • Result (in a way) reconfirms Adamic’s result of positive correlation between specialist and productivity
  • 16.
    Geniuses, Birds, Beavers,Frogs Geniuses: Dwell on many topics at all times (8-9) Birds: Dwell on many topics over their research career, but a few topics at a given time Beavers: Specialists whose focus is interdisciplinary Frogs: High-focused
  • 17.
  • 18.
  • 19.
    Impact of ResearcherDiversity on Awards
  • 20.
    Future Work • Largerdatasets • Researchers in the long tail are specialists; generalists are in the head of the tail;
  • 21.
    Summary • A frameworkto understand researcher diversity • Quantification of researcher diversity with productivity and awards • Negative correlation of diversity with productivity and positive with awards • Use more accurate author disambiguation methods

Editor's Notes

  • #13 (i.e. single concept can exist in more than one place in different contexts).within the hierarchy each meaning, in MeSH terminology called concept, is represented by its Unique ID, its MeSH heading (default name) and its Tree Numbers (contexts). The tree numbers are in fact all distinct drill paths leading from the root of the hierarchy to the concept. Each tree number is a level, and encompasses all levels below it. Every journal article is indexed with about 10-15 descriptors, allowing us to compare researchers with similar number of publications. We, though however, eliminate polysemous descriptors (which occur in multiple tree paths) so as to not include unrelated research areas.