1. Takeaways from a
Systematic Study of
75,000 ML Models
Nazneen Rajani | Robustness Research Lead @ Hugging Face | nazneen@hf.co | @nazneenrajani
2. Ecosystem as part of the ML workflow
Collect data Train model Evaluate Deploy
>10K datasets >75K models
>70 metrics and
measurements
Spaces/ Gradio for
demos
18. Model Documentation
Collect data Train model Evaluate Deploy
✔ Dataset ✔ How to use
✔ Intended
uses
✔ Evaluation
✔ Limitations
✔ Training
✔ Environmental impact
36. How has model documentation evolved?
Observation: Model documentation has evolved
37. Observation: Model documentation has evolved
Goal: Use word embeddings to capture change in content
How has model documentation evolved?
38. Observation: Model documentation has evolved
Goal: Use word embeddings to capture change in content
Steps:
1. Train a word2vec model for each year
How has model documentation evolved?
39. Observation: Model documentation has evolved
Goal: Use word embeddings to capture change in content
Steps:
1. Train a word2vec model for each year
2. Align the vocabulary (so same word can be compared across years) (Hamilton et
al., 2016)
How has model documentation evolved?
40. Observation: Model documentation has evolved
Goal: Use word embeddings to capture change in content
Steps:
1. Train a word2vec model for each year
2. Align the vocabulary (so same word can be compared across years) (Hamilton et
al., 2016)
How has model documentation evolved?
Smith et al, 2017
41. Observation: Model documentation has evolved
Goal: Use word embeddings to capture change in content
Steps:
1. Train a word2vec model for each year
2. Align the embeddings (so same word can be compared across years) (Hamilton
et al., 2016)
3. Compare nearest neighbors or pairwise similarity of vectors
How has model documentation evolved?
42. How has model documentation evolved?
data
(2020)
data
(2021)
data
(2022)
language
transformer
instance
datasets
search
rate
total
example
download
arabic
tokenizer
negative
french
present
information
speech
articles
decay
mlm
sentiment
multilingual
unlabeled
results
corpus
checkpoint
representation
tuned
languages
pretraining
repository
43. How has model documentation evolved?
data
(2020)
data
(2021)
data
(2022)
language
transformer
instance
datasets
search
rate
total
example
download
arabic
tokenizer
negative
french
present
information
speech
articles
decay
mlm
sentiment
multilingual
unlabeled
results
corpus
checkpoint
representation
tuned
languages
pretraining
repository
44. How has model documentation evolved?
modeling
(2020)
modeling
(2021)
modeling
(2022)
language
analysis
outperform
classification
novel
multimodal
commands
optimizer
epochs
hyperparameter
framework
approaches
recognition
proceedings
pretrain
schedule
syntax
learning
detection
augmenting
transfer
reconstruct
removal
generalize cleandata
learning
explanations
downstream
intended
45. How has model documentation evolved?
modeling
(2020)
modeling
(2021)
modeling
(2022)
language
analysis
outperform
classification
novel
multimodal
commands
optimizer
epochs
hyperparameter
framework
approaches
recognition
proceedings
pretrain
schedule
syntax
learning
detection
augmenting
transfer
reconstruct
removal
generalize cleandata
learning
explanations
downstream
intended
46. Pairwise similarity between word vectors
Word 2020 2021 2022 Spearman
correlation (ρ)
fairness -0.1 0.0 0.99 0.9
biased -0.01 0.99 0.99 0.86
humans 0.0 0.99 0.99 0.86
statistically 0.99 0.99 -0.02 -0.86
Similarity to the word “evaluate”
47. Pairwise similarity between word vectors
Word 2020 2021 2022 Spearman
correlation (ρ)
fairness -0.1 0.0 0.99 0.9
biased -0.01 0.99 0.99 0.86
humans 0.0 0.99 0.99 0.86
statistically 0.99 0.99 -0.02 -0.86
Similarity to the word “evaluate”
48. Pairwise similarity between word vectors
Word 2020 2021 2022 Spearman
correlation (ρ)
harmful -0.01 0.99 0.99 0.86
early 0.0 0.99 0.99 0.86
reproducible 0.99 0.02 0.99 0.0
statistically 0.99 0.99 -0.04 -0.87
Similarity to the word “training”
49. Pairwise similarity between word vectors
Word 2020 2021 2022 Spearman
correlation (ρ)
harmful -0.01 0.99 0.99 0.86
early 0.0 0.99 0.99 0.86
reproducible 0.99 0.02 0.99 0.0
statistically 0.99 0.99 -0.04 -0.87
Similarity to the word “training”
56. Takeaways
Although there is an exponential growth in NLP models, they are dominated by a
few task categories.
The dominant NLP task categories show seasonal patterns
Model documentation has evolved from model-centric to data-centric*
57. Thanks for listening
James Zou
(Stanford)
Weixin Liang
(Stanford)
Xinyu Yang
(ZJU)
Meg Mitchell
(Hugging Face)
Collaborators