An introduction to computer vision with Hugging Face

An Introduc
ti
on to Computer Vision
with Hugging Face
Julien Simon, Chief Evangelist, Hugging Face
julsimon@huggingface.co

Computer Vision put Deep Learning on the map
Image classification Object detection
Semantic segmentation
Instance segmentation
Pose estimation
Depth prediction
Source: GluonCV

1998-2021 : Convolutional Neural Networks
Source: Wikipedia
CNNs extract features with learned filters.
A lot of pixels are discarded along the way.

2021 : The Vision Transformer (Google)
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" https://arxiv.org/abs/2010.11929
ViT breaks an image into patches,
which are flattened and processed
as token sequences.
+ State-of-the-art accuracy
+ 4x less compute required for training
+ Transfer learning
Source: research paper

Research on CV Transformers: 11x in 2 years

The Hugging Face Hub: The Github of Machine Learning
110K models
18K datasets
25+ ML libraries: Keras, spaCY,
Scikit-Learn, fastai, etc.
10K organiza
ti
ons
100K+ users daily
1M+ downloads daily
h
tt
ps://huggingface.co

4,000+ models for Computer Vision
1. PyTorch Image models (
ti
mm)
2. CV Transformers
3. Mul
ti
-modal Transformers
4. Genera
ti
ve CV: Di
ff
users

1. PyTorch Image Models (aka timm)
h
tt
ps://github.com/rwightman/pytorch-image-models
• Models, scripts, pretrained weights
ResNet, ResNeXT, E
ffi
cientNet,
E
ffi
cientNetV2, NFNet, Vision
Transformer, MixNet, MobileNet-V3/V2,
RegNet, DPN, CSPNet, and more
• Now available on the Hugging Face hub
300+ models
h
tt
ps://huggingface.co/
ti
mm
h
tt
ps://huggingface.co/docs/hub/
ti
mm

2. CV Transformers: image and video classification
openai/clip-vit-base-patch32
google/vit-base-patch16-224
https://huggingface.co/spaces/juliensimon/battle_of_image_classifiers

3. CV Transformers: detection and segmentation
facebook/maskformer-swin-large-ade
facebook/detr-resnet-101

State-of-the-art prediction with 2 lines of Python
[{'score': 0.9985879063606262, 'label': 'motorcycle',
'box': {'xmin': 240, 'ymin': 185, 'xmax': 890, 'ymax': 593}},
{'score': 0.9886626601219177, 'label': 'backpack',
'box': {'xmin': 453, 'ymin': 87, 'xmax': 570, 'ymax': 220}},
{'score': 0.9997599720954895, 'label': 'person',
'box': {'xmin': 456, 'ymin': 28, 'xmax': 684, 'ymax': 551}}]

3. Multi-modal CV Transformers
Image cap
ti
oning
h
tt
ps://huggingface.co/spaces/nielsr/comparing-cap
ti
oning-models
Zero-shot segmenta
ti
on with text prompt
h
tt
ps://huggingface.co/spaces/nielsr/CLIPSeg
Audio classi
fi
ca
ti
on with spectrogram
h
tt
ps://huggingface.co/spaces/juliensimon/keyword-spo
tti
ng

4. Generative models: text-to-image
https://github.com/huggingface/diffusers/
https://huggingface.co/spaces/stabilityai/stable-diffusion

4. Generative models: image inpainting
https://huggingface.co/spaces/multimodalart/stable-diffusion-inpainting

Training and deploying models with Hugging Face
Model in
produc
ti
on
18,000+ datasets
on the hub
110,000+ models
on the hub
No-code AutoML
Managed
Inference on AWS
and Azure
Hosted ML applica
ti
ons
HW-accelerated
training & inference
Amazon SageMaker
Deploy
anywhere
Datasets
Models
Hugging Face Endpoints
for Azure
Transformers
Accelerate
Optimum
Diffusers
Evaluate

https://huggingface.co/tasks
https://huggingface.co/course
https://huggingface.co/docs/{datasets, transformers, diffusers}
https://github.com/huggingface/{datasets, transformers, diffusers}
https://discuss.huggingface.co/
https://huggingface.co/support
Getting started Stay in touch!
@julsimon
julsimon.medium.com
youtube.com/c/juliensimonfr

An introduction to computer vision with Hugging Face

More Related Content

What's hot

Similar to An introduction to computer vision with Hugging Face

More from Julien SIMON

Recently uploaded

An introduction to computer vision with Hugging Face