TASK SPECIALIZATION ACROSS
RESEARCH CAREERS
N. Robinson-Garcia, R. Costas, C.R.
Sugimoto, V. Larivière and G.F. Nane
2
UNVEILING THE ECOSYSTEM OF SCIENCE: A CONTEXTUAL PERSPECTIVE ON THE MANY ROLES OF SCIENTISTS
FRAMEWORK OF THE STUDY
3
UNVEILING THE ECOSYSTEM OF SCIENCE: A CONTEXTUAL PERSPECTIVE ON THE MANY ROLES OF SCIENTISTS
FRAMEWORK OF THE STUDY
https://elifesciences.org/articles/60586 https://zenodo.org/record/3891055
4
1. Can we identify diversity of profiles in science?
• How can we identify it?
• Can diversity in science be described in a systematic way?
2. How does research evaluation affect diversity?
• Are there observable differences on research trajectories by type of profile?
• Are there observable gender differences by type of profile?
• How do they relate with performativity (i.e., publications and citation impact)?
GOALS
MOTIVATION
5
How are research careers assessed in academia?
Scientific leadership VS. Team science
• The number of middle authors is raising (Mongeon et al., 2017)
• Middle authors tend to have shorter career trajectories (Milojević et al., 2018)
The underlying assumptions of these studies is that author order in
publications reflect leadership and that scientists specialise on specific tasks
STARTING POINT
MOTIVATION
6
Author order reflects author contribution to scientific studies
• There is a relation, but it is not always consistent (Sauermann & Haeussler, 2017)
• Contributions do not reflect importance or level of involvement (Sauermann & Haeussler, 2017)
Middle authors conduct technical tasks
• U-shaped relation between author order and conceptual contributions (Larivière et al., 2016)
• Increasing variety of technical contributions (Larivière et al., 2020)
Seniority is related to types of contributions
• Seniority is also reflected in author order and contributorship (Larivière et al., 2016)
These studies look at author-publication combinations, but do not look directly
into individual profiles
UNDERLYING ASSUMPTIONS
MOTIVATION
7
Can we predict the probability of contribution of an author?
• We include two types of variables: individual level and publication level
• We apply the predictive model to the complete publication history of a set of researchers
Can we profile researchers based on their predicted contributorships?
• Career trajectories are defined by first year of publication and divided into four career stages
• For each career stage we apply Robust Archetypal Analysis and assign researchers to archetypes
How do profiles refer to research careers?
• We look into career length, gender differences, productivity and citation impact
STUDY DESIGN
RESEARCH QUESTIONS
8
DATA AND METHODS
PROCESS FLOW
1
SEED DATASET
Combination of bibliometric variables
and contribution statements
2
3
4
5
PREDICTION MODEL
Bayesian Networks to model data
Cross-validation of predictions
TRAJECTORIES DATASET
Gender identification
Break down by career stages
≥ 5 publications
IDENTIFICATION OF PROFILES
Robust Archetypal Analysis
Assignment of researchers to
archetypes
PERFORMANCE BY PROFILES
Career length
Gender
Productivity and impact
Author order
9
SEED DATASET
DATA AND METHODS
70,694 publications
PLOS journals
Medical and Life Sciences
Contribution statements from API (Larivière et al., 2016)
Matching
CWTS-in house Web of Science
Only pubs with all authors identified
Match by disambiguated author (Caron & van Eck, 2014)
Composition of the dataset
Contribution statements*
Individual level – YE | PU
Publication level – PO | AU | DT | CO | IN
*Two were removed
10
A. Junior < 5 y; Early- ≥ 5 > 15 y; Mid- ≥
15 > 30 y; Late- ≥ 30
Declining technical contributorships
over time
WR and CE more stable but decline
in late-career
B. First authors most weight except CT
for middle authors who contribute to
technical contributorships
Last authors on WR and CE
COMMENTS
DATA AND METHODS
SEED DATASET
11
DATA AND METHODS
BAYESIAN NETWORKS
BN graphically depicts interactions among
dependent multivariate data.
Directed acyclic graph (DAG), nodes represent random
variables and arcs encode direct influences
Max-Min Hill-Climbing (MMHC) algorithm
Combination of score-based and constraint-based
algorithms
Use of a white-list
Directionality of arcs
Robustness checks
Bootstrapping with replacement (50), threshold > 80%
K-fold cross-validation (10 subsets)
12
DATA AND METHODS
ROBUST ARCHETYPAL ANALYSIS (RAA)
Data aggregation of predicted contributorships
• Median value of contributorships by career stage
Archetypes as extreme observations in a multivariate
dataset
• RAA is less sensitive to outliers
• Archetypes not exclusive
• RAA is not a clustering techniques
Assignment to archetypes
• We use α-scores to assign researchers by career stage
13
FINDINGS
DIFFERENCES ON PREDICTED CONTRIBUTORSHIPS BY CAREER STAGE
The model seems to discriminate by
career stage
Notable differences by type of
contribution
14
FINDINGS
PARAMETERS OF ARCHETYPES BY CAREER STAGE
• Similarity of profiles between
career-stages
• Leader profile defined by highest
values on WR and CE
• Specialized profile as the one
performing the experiments
• Supporting role may have a
different meaning at late-career
stage
COMMENTS
FINDINGS
PROFILES AND CAREER LENGTH
15
FINDINGS
PROFILES AND PERFORMANCE
16
Large
effect
size
Medium
effect
size
FINDINGS
PROFILES AND GENDER
17
• 43% and 77% men have a leader
profile in early- and mid-career
stages
• 27% and 65% women have a
leader profile in early- and mid-
career stages
• Gender differences for leaders
and specialized at these stages
have a medium effect size
COMMENTS
FINDINGS
PROFILES AND AUTHOR ORDER
18
• Middle authorships largest share
irrespective of profile but with
differences by profile
• Specialists similar shares of 1st
author as leader, but not as last
authors
• Similar distributions at late-career
stage
COMMENTS
CONCLUSIONS
IMPLICATIONS
19
• Task specialization seems to affect career prospects
• Leading profiles seem to be more versatile than others
• Bibliometric indicators seem to undermine specific profiles
• Gender differences observed at early-career stages could be related to task
specialization
CONCLUSIONS
CAUTIONARY REMARKS
20
• Representativeness of the sample
• Identification of scientists
• Appropriateness of the contribution taxonomy
• Measuring uncertainty
• Longitudinal analysis of archetypes
CONCLUSIONS
CAUTIONARY REMARKS
21
We do not look into causality, although…
Author response to reviewers
elrobin@ugr.es
THANK YOU! QUESTIONS?
ALSO FEEL FREE TO CONTACT ME AT:
http://nrobinsongarcia.com
@nrobinsongarcia
23
ADDITIONAL NOTES
SEED DATASET JOURNAL DISTRIBUTION
24
ADDITIONAL NOTES
FIELD DELIMITATION
Fields are assigned based on the Dutch NOWT Classification linked to Web of
Science Subject categories
3 levels – 7 broad fields, 14 fields and 34 subjects
Broad field assigned based on the share of referenced journals
More here https://www.cwts.nl/pdf/nowt_classification_sc.pdf
25
ADDITIONAL NOTES
MIXED CORRELATION MATRIX

Task specialization across research careers

  • 1.
    TASK SPECIALIZATION ACROSS RESEARCHCAREERS N. Robinson-Garcia, R. Costas, C.R. Sugimoto, V. Larivière and G.F. Nane
  • 2.
    2 UNVEILING THE ECOSYSTEMOF SCIENCE: A CONTEXTUAL PERSPECTIVE ON THE MANY ROLES OF SCIENTISTS FRAMEWORK OF THE STUDY
  • 3.
    3 UNVEILING THE ECOSYSTEMOF SCIENCE: A CONTEXTUAL PERSPECTIVE ON THE MANY ROLES OF SCIENTISTS FRAMEWORK OF THE STUDY https://elifesciences.org/articles/60586 https://zenodo.org/record/3891055
  • 4.
    4 1. Can weidentify diversity of profiles in science? • How can we identify it? • Can diversity in science be described in a systematic way? 2. How does research evaluation affect diversity? • Are there observable differences on research trajectories by type of profile? • Are there observable gender differences by type of profile? • How do they relate with performativity (i.e., publications and citation impact)? GOALS MOTIVATION
  • 5.
    5 How are researchcareers assessed in academia? Scientific leadership VS. Team science • The number of middle authors is raising (Mongeon et al., 2017) • Middle authors tend to have shorter career trajectories (Milojević et al., 2018) The underlying assumptions of these studies is that author order in publications reflect leadership and that scientists specialise on specific tasks STARTING POINT MOTIVATION
  • 6.
    6 Author order reflectsauthor contribution to scientific studies • There is a relation, but it is not always consistent (Sauermann & Haeussler, 2017) • Contributions do not reflect importance or level of involvement (Sauermann & Haeussler, 2017) Middle authors conduct technical tasks • U-shaped relation between author order and conceptual contributions (Larivière et al., 2016) • Increasing variety of technical contributions (Larivière et al., 2020) Seniority is related to types of contributions • Seniority is also reflected in author order and contributorship (Larivière et al., 2016) These studies look at author-publication combinations, but do not look directly into individual profiles UNDERLYING ASSUMPTIONS MOTIVATION
  • 7.
    7 Can we predictthe probability of contribution of an author? • We include two types of variables: individual level and publication level • We apply the predictive model to the complete publication history of a set of researchers Can we profile researchers based on their predicted contributorships? • Career trajectories are defined by first year of publication and divided into four career stages • For each career stage we apply Robust Archetypal Analysis and assign researchers to archetypes How do profiles refer to research careers? • We look into career length, gender differences, productivity and citation impact STUDY DESIGN RESEARCH QUESTIONS
  • 8.
    8 DATA AND METHODS PROCESSFLOW 1 SEED DATASET Combination of bibliometric variables and contribution statements 2 3 4 5 PREDICTION MODEL Bayesian Networks to model data Cross-validation of predictions TRAJECTORIES DATASET Gender identification Break down by career stages ≥ 5 publications IDENTIFICATION OF PROFILES Robust Archetypal Analysis Assignment of researchers to archetypes PERFORMANCE BY PROFILES Career length Gender Productivity and impact Author order
  • 9.
    9 SEED DATASET DATA ANDMETHODS 70,694 publications PLOS journals Medical and Life Sciences Contribution statements from API (Larivière et al., 2016) Matching CWTS-in house Web of Science Only pubs with all authors identified Match by disambiguated author (Caron & van Eck, 2014) Composition of the dataset Contribution statements* Individual level – YE | PU Publication level – PO | AU | DT | CO | IN *Two were removed
  • 10.
    10 A. Junior <5 y; Early- ≥ 5 > 15 y; Mid- ≥ 15 > 30 y; Late- ≥ 30 Declining technical contributorships over time WR and CE more stable but decline in late-career B. First authors most weight except CT for middle authors who contribute to technical contributorships Last authors on WR and CE COMMENTS DATA AND METHODS SEED DATASET
  • 11.
    11 DATA AND METHODS BAYESIANNETWORKS BN graphically depicts interactions among dependent multivariate data. Directed acyclic graph (DAG), nodes represent random variables and arcs encode direct influences Max-Min Hill-Climbing (MMHC) algorithm Combination of score-based and constraint-based algorithms Use of a white-list Directionality of arcs Robustness checks Bootstrapping with replacement (50), threshold > 80% K-fold cross-validation (10 subsets)
  • 12.
    12 DATA AND METHODS ROBUSTARCHETYPAL ANALYSIS (RAA) Data aggregation of predicted contributorships • Median value of contributorships by career stage Archetypes as extreme observations in a multivariate dataset • RAA is less sensitive to outliers • Archetypes not exclusive • RAA is not a clustering techniques Assignment to archetypes • We use α-scores to assign researchers by career stage
  • 13.
    13 FINDINGS DIFFERENCES ON PREDICTEDCONTRIBUTORSHIPS BY CAREER STAGE The model seems to discriminate by career stage Notable differences by type of contribution
  • 14.
    14 FINDINGS PARAMETERS OF ARCHETYPESBY CAREER STAGE • Similarity of profiles between career-stages • Leader profile defined by highest values on WR and CE • Specialized profile as the one performing the experiments • Supporting role may have a different meaning at late-career stage COMMENTS
  • 15.
  • 16.
  • 17.
    FINDINGS PROFILES AND GENDER 17 •43% and 77% men have a leader profile in early- and mid-career stages • 27% and 65% women have a leader profile in early- and mid- career stages • Gender differences for leaders and specialized at these stages have a medium effect size COMMENTS
  • 18.
    FINDINGS PROFILES AND AUTHORORDER 18 • Middle authorships largest share irrespective of profile but with differences by profile • Specialists similar shares of 1st author as leader, but not as last authors • Similar distributions at late-career stage COMMENTS
  • 19.
    CONCLUSIONS IMPLICATIONS 19 • Task specializationseems to affect career prospects • Leading profiles seem to be more versatile than others • Bibliometric indicators seem to undermine specific profiles • Gender differences observed at early-career stages could be related to task specialization
  • 20.
    CONCLUSIONS CAUTIONARY REMARKS 20 • Representativenessof the sample • Identification of scientists • Appropriateness of the contribution taxonomy • Measuring uncertainty • Longitudinal analysis of archetypes
  • 21.
    CONCLUSIONS CAUTIONARY REMARKS 21 We donot look into causality, although… Author response to reviewers
  • 22.
    elrobin@ugr.es THANK YOU! QUESTIONS? ALSOFEEL FREE TO CONTACT ME AT: http://nrobinsongarcia.com @nrobinsongarcia
  • 23.
  • 24.
    24 ADDITIONAL NOTES FIELD DELIMITATION Fieldsare assigned based on the Dutch NOWT Classification linked to Web of Science Subject categories 3 levels – 7 broad fields, 14 fields and 34 subjects Broad field assigned based on the share of referenced journals More here https://www.cwts.nl/pdf/nowt_classification_sc.pdf
  • 25.