evonlp_slides.pdf

Takeaways from a
Systematic Study of
75,000 ML Models
Nazneen Rajani | Robustness Research Lead @ Hugging Face | nazneen@hf.co | @nazneenrajani

Ecosystem as part of the ML workﬂow
Collect data Train model Evaluate Deploy
>10K datasets >75K models
>70 metrics and
measurements
Spaces/ Gradio for
demos

ML Modeling Landscape
There is an exponential growth of ML models.

ML Modeling Landscape
Distribution by task categories

NLP Modeling Landscape
Approx 40% of the task categories are NLP
Covering 78% of the models

Coverage is 90% of models if we include speech and multimodal categories

NLP Task Categories
Over 80% models on HF

NLP Task Categories
ﬁll-mask

NLP Task Categories
ﬁll-mask translation

NLP Task Categories
ﬁll-mask translation
generation classiﬁcation

Model
documentation!
Image credit: xkcd
How can we learn more about the models?

Model Documentation
Collect data Train model Evaluate Deploy
✔ Dataset ✔ How to use
✔ Intended
uses
✔ Evaluation
✔ Limitations
✔ Training
✔ Environmental impact

Model Documentation in
Model documentation is part of the repo’s README
��

Model documentation statistics
Newer models are
less likely to have
model cards

What do developers document about models?
Distribution of sections in model cards

What do developers document about models?
Distribution of length of sections in model cards

How has model documentation evolved?
Observation: Model documentation has evolved

Goal: Use word embeddings to capture change in content

Steps:
1. Train a word2vec model for each year

Steps:
2. Align the vocabulary (so same word can be compared across years) (Hamilton et
al., 2016)

Steps:
2. Align the vocabulary (so same word can be compared across years) (Hamilton et
al., 2016)
Smith et al, 2017

Steps:
2. Align the embeddings (so same word can be compared across years) (Hamilton
et al., 2016)
3. Compare nearest neighbors or pairwise similarity of vectors

data
(2020)
data
(2021)
data
(2022)
language
transformer
instance
datasets
search
rate
total
example
download
arabic
tokenizer
negative
french
present
information
speech
articles
decay
mlm
sentiment
multilingual
unlabeled
results
corpus
checkpoint
representation
tuned
languages
pretraining
repository

modeling
(2020)
modeling
(2021)
modeling
(2022)
language
analysis
outperform
classiﬁcation
novel
multimodal
commands
optimizer
epochs
hyperparameter
framework
approaches
recognition
proceedings
pretrain
schedule
syntax
learning
detection
augmenting
transfer
reconstruct
removal
generalize cleandata
learning
explanations
downstream
intended

Pairwise similarity between word vectors
Word 2020 2021 2022 Spearman
correlation (ρ)
fairness -0.1 0.0 0.99 0.9
biased -0.01 0.99 0.99 0.86
humans 0.0 0.99 0.99 0.86
statistically 0.99 0.99 -0.02 -0.86
Similarity to the word “evaluate”

Pairwise similarity between word vectors
Word 2020 2021 2022 Spearman
correlation (ρ)
harmful -0.01 0.99 0.99 0.86
early 0.0 0.99 0.99 0.86
reproducible 0.99 0.02 0.99 0.0
statistically 0.99 0.99 -0.04 -0.87
Similarity to the word “training”

evaluate
fairness
2020
evaluate
fairness evaluate
fairness
2021 2022
Relation to the word “evaluate”
sim = -0.1 sim = 0.0 sim = 0.99
Spearman correlation ρ= 0.9

evaluate
biased
2020
evaluate
biased
evaluate
biased
2021 2022
sim = 0.99 sim = 0.99
sim = -0.01

evaluate
humans
2020
evaluate
humans
evaluate
humans
2021 2022
sim = 0.0 sim = 0.99 sim = 0.99

evaluate
statistically
2020
evaluate
statistically
evaluate
statistically
2021 2022
sim = 0.99 sim = 0.99 sim = -0.02
Spearman correlation ρ= -0.86

training
early
2020
training
early
training
2021 2022
early
sim = 0.0 sim = 0.99 sim = 0.99
Relation to the word “training”

training
harmful
2020
training
harmful
training
2021 2022
harmful
sim = 0.0 sim = 0.99 sim = 0.99
Relation to the word “training”

Takeaways
Although there is an exponential growth in NLP models, they are dominated by a
few task categories.
The dominant NLP task categories show seasonal patterns
Model documentation has evolved from model-centric to data-centric*

Thanks for listening
James Zou
(Stanford)
Weixin Liang
(Stanford)
Xinyu Yang
(ZJU)
Meg Mitchell
(Hugging Face)
Collaborators

evonlp_slides.pdf

Recommended

Recommended

More Related Content

Similar to evonlp_slides.pdf

Similar to evonlp_slides.pdf (20)

Recently uploaded

Recently uploaded (20)

evonlp_slides.pdf