SlideShare a Scribd company logo
1
CNRS & Université Paris-Saclay
BALÁZS KÉGL
DEEP LEARNING	

A STORY OF HARDWARE, DATA, AND
TECHNIQUES&TRICKS
2
The bumpy 60-year history that
led to the current state of the
art
and an overview of what you
can do with it
3
DEEP LEARNING = THREE INTERTWINING STORY
techniques / tricks hardware data
1957-69

dawn
perceptron early mainframes toy linear, small images, XOR
1986-95

golden age
early NNs workstations MNIST
2006-

deep learning
deep NNs GPU,TPU, Intel Xeon Phi Imagenet
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.
neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48.
The third, fourth, and fifth convolutional layers are connected to one another without any intervening
pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥
256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth
• Classification problem y = f(x)
4
DATA-DRIVEN INFERENCE
x
f y
‘Stomorhina’
f y
‘Scaeva’
x
• Classification problem y = f(x)
• No model to fit, but a large set of (x, y) pairs	

• The source is typically observation + human labeling
• And a loss function L(y, ypred)
5
DATA-DRIVEN INFERENCE
6
THE PERCEPTRON (ROSENBLATT 1957)
Weights were encoded in potentiometers, and weight
updates during learning were performed by electric motors.
7
THE PERCEPTRON (ROSENBLATT 1957)
Based on Rosenblatt's
statements, The New York
Times reported the perceptron
to be "the embryo of an
electronic computer that [the
Navy] expects will be able to
walk, talk, see, write, reproduce
itself and be conscious of its
existence."
8
MINSKY-PAPERT (1969)
It is impossible to linearly
separate the XOR function
The first winter: 1969 - 86
Is it?
• We knew in the early seventies that the nonlinearity
problem was possible to overcome (Grossberg 72)	

• It was also suggested by several authors that error
gradients could be back propagated through the chain
rule	

• So why the winter? floating point multiplication was
expensive
9
MULTI-LAYER PERCEPTRON
The XOR net
10
BACK PROPAGATION
• Convolutional nets	

• The first algorithmic tricks: initialization, weight decay,
early stopping	

• Some limited understanding of the theory	

• First commercial success:AT&T check reader (Bottou,
LeCun, Burges, Nohl, Bengio, Haffner, 1996)
11
THE GOLDEN AGE (1986-95)
12
THE AT&T CHECK READER
• Reading checks is more
than character
recognition	

• If all steps are
differentiable, the whole
system can be trained
end-to-end by backdrop	

• Does it ring a bell?
(Google’s TensorFlow)
13
THE AT&T CHECK READER
• The mainstream narrative	

• Nonconvexity, local minima, lack of theoretical understanding - BS,
looking for your key where its lit	

• The vanishing gradient - true	

• Lots of nuts and bolts - partially true but would not explain if it was
worth the effort	

• Strong competitors with an order of magnitude less engineering:
the support vector machine, forests, boosting - true
14
THE SECOND WINTER (1996-2006-2012)
• The real story	

• We didn’t have the computational power and architecture and
large enough data to train deep nets	

• Random forests are way less high-maintenance and they are on par
with single-hidden-layer (shallow) nets, even today	

• We missed some of the tricks due to lack of critical mass in
research: trial and error is expensive	

• NNs didn’t disappear from industry: the check reader processes
20M checks per day, today
15
THE SECOND WINTER (1996-2006-2012)
16
FIRST TAKE-HOME MESSAGE
Before you jump on
the deep learning
bandwagon: scikit-
learn forests +
xgboost gets 

>90% performance
on >90% of the
industrial problems,
cautious estimate
• NNs are back on the research agenda
17
2006: A NEW WAVE BEGINS
18
2009: IMAGENET
“We believe that a large-scale ontology of images is a critical
resource for developing advanced, large-scale content-based
image search and image understanding algorithms, as well as for
providing critical training and benchmarking data for such
algorithms.” (Fei Fei Li et al CVPR09)	

!
• 80K hierarchical categories	

• 80M images of size >100x100	

• labeled by 50K Amazon Turks
19
2009: IMAGENET
20
GPUS (2004 - )
21
GPUS (2004 - )
• dropout, ReLU, max-pooling, data augmentation, batch normalization,
automatic differentiation, end-to-end training, lots of layers	

• Krizhevsky, Sutskever, Hinton (2012): 1.2M images, 60M parameters,
6 days training on two GPUs
22
TECHNIQUES & TRICKS
23
IMAGENET COMPETITIONS
24
SECOND TAKE-HOME MESSAGE
To make deep learning shine,
you need huge labeled data sets and time to train
• Imagenet (80M >100x100 color images, 80K classes)	

• FaceBook (300M photos/day)	

• Google (300h of video/minute)
25
SECOND TAKE-HOME MESSAGE
To make deep learning shine,
you need huge labeled data sets and time to train
• Theano	

• TensorFlow	

• Keras	

• Caffe	

• Torch
26
TODAY: EASY-TO-USE LIBRARIES
27
TODAY: HARDWARE
Google TPU
28
COMMERCIAL APPLICATIONS
29
GOOGLE IMAGE SEARCH
30
FACE RECOGNITION/DETECTION	

A 6B$ MARKET IN 2020
31
SELF-DRVING CARS
32

More Related Content

Similar to A historical introduction to deep learning: hardware, data, and tricks

AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
inside-BigData.com
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
Seldon
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
Héloïse Nonne
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
BigDataEverywhere
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
Larry Smarr
 
Quantum Computing and its security implications
Quantum Computing and its security implicationsQuantum Computing and its security implications
Quantum Computing and its security implications
InnoTech
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
Dean Wyatte
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Purnima Pandit
 
Quantum computers
Quantum computersQuantum computers
Quantum computers
Rishabh Jindal
 
Quantum Information FINAL.pptx
Quantum Information FINAL.pptxQuantum Information FINAL.pptx
Quantum Information FINAL.pptx
gitrahekno
 
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserBig Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Angela Hey
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at Intel
Atul Vaish
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
Facultad de Informática UCM
 
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Larry Smarr
 
ETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdfETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdf
mashiur
 
Quantum Computing Applications in Power Systems
Quantum Computing Applications in Power SystemsQuantum Computing Applications in Power Systems
Quantum Computing Applications in Power Systems
Power System Operation
 
Quantum computation - Introduction
Quantum computation - IntroductionQuantum computation - Introduction
Quantum computation - Introduction
Aakash Martand
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
Kv Sagar
 
Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115
domenico di mola
 

Similar to A historical introduction to deep learning: hardware, data, and tricks (20)

AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
Quantum Computing and its security implications
Quantum Computing and its security implicationsQuantum Computing and its security implications
Quantum Computing and its security implications
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Quantum computers
Quantum computersQuantum computers
Quantum computers
 
Quantum Information FINAL.pptx
Quantum Information FINAL.pptxQuantum Information FINAL.pptx
Quantum Information FINAL.pptx
 
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserBig Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at Intel
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
 
ETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdfETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdf
 
Quantum Computing Applications in Power Systems
Quantum Computing Applications in Power SystemsQuantum Computing Applications in Power Systems
Quantum Computing Applications in Power Systems
 
Quantum computation - Introduction
Quantum computation - IntroductionQuantum computation - Introduction
Quantum computation - Introduction
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115
 

More from Balázs Kégl

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
Balázs Kégl
 
Model-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsModel-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systems
Balázs Kégl
 
Managing the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopManaging the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loop
Balázs Kégl
 
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
Balázs Kégl
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
Balázs Kégl
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submission
Balázs Kégl
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
Balázs Kégl
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
Balázs Kégl
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
Balázs Kégl
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
Balázs Kégl
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
Balázs Kégl
 

More from Balázs Kégl (11)

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
 
Model-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsModel-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systems
 
Managing the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopManaging the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loop
 
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submission
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
 

Recently uploaded

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 

Recently uploaded (20)

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 

A historical introduction to deep learning: hardware, data, and tricks

  • 1. 1 CNRS & Université Paris-Saclay BALÁZS KÉGL DEEP LEARNING A STORY OF HARDWARE, DATA, AND TECHNIQUES&TRICKS
  • 2. 2 The bumpy 60-year history that led to the current state of the art and an overview of what you can do with it
  • 3. 3 DEEP LEARNING = THREE INTERTWINING STORY techniques / tricks hardware data 1957-69
 dawn perceptron early mainframes toy linear, small images, XOR 1986-95
 golden age early NNs workstations MNIST 2006-
 deep learning deep NNs GPU,TPU, Intel Xeon Phi Imagenet Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000. neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth
  • 4. • Classification problem y = f(x) 4 DATA-DRIVEN INFERENCE x f y ‘Stomorhina’ f y ‘Scaeva’ x
  • 5. • Classification problem y = f(x) • No model to fit, but a large set of (x, y) pairs • The source is typically observation + human labeling • And a loss function L(y, ypred) 5 DATA-DRIVEN INFERENCE
  • 6. 6 THE PERCEPTRON (ROSENBLATT 1957) Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors.
  • 7. 7 THE PERCEPTRON (ROSENBLATT 1957) Based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence."
  • 8. 8 MINSKY-PAPERT (1969) It is impossible to linearly separate the XOR function The first winter: 1969 - 86 Is it?
  • 9. • We knew in the early seventies that the nonlinearity problem was possible to overcome (Grossberg 72) • It was also suggested by several authors that error gradients could be back propagated through the chain rule • So why the winter? floating point multiplication was expensive 9 MULTI-LAYER PERCEPTRON The XOR net
  • 11. • Convolutional nets • The first algorithmic tricks: initialization, weight decay, early stopping • Some limited understanding of the theory • First commercial success:AT&T check reader (Bottou, LeCun, Burges, Nohl, Bengio, Haffner, 1996) 11 THE GOLDEN AGE (1986-95)
  • 13. • Reading checks is more than character recognition • If all steps are differentiable, the whole system can be trained end-to-end by backdrop • Does it ring a bell? (Google’s TensorFlow) 13 THE AT&T CHECK READER
  • 14. • The mainstream narrative • Nonconvexity, local minima, lack of theoretical understanding - BS, looking for your key where its lit • The vanishing gradient - true • Lots of nuts and bolts - partially true but would not explain if it was worth the effort • Strong competitors with an order of magnitude less engineering: the support vector machine, forests, boosting - true 14 THE SECOND WINTER (1996-2006-2012)
  • 15. • The real story • We didn’t have the computational power and architecture and large enough data to train deep nets • Random forests are way less high-maintenance and they are on par with single-hidden-layer (shallow) nets, even today • We missed some of the tricks due to lack of critical mass in research: trial and error is expensive • NNs didn’t disappear from industry: the check reader processes 20M checks per day, today 15 THE SECOND WINTER (1996-2006-2012)
  • 16. 16 FIRST TAKE-HOME MESSAGE Before you jump on the deep learning bandwagon: scikit- learn forests + xgboost gets 
 >90% performance on >90% of the industrial problems, cautious estimate
  • 17. • NNs are back on the research agenda 17 2006: A NEW WAVE BEGINS
  • 18. 18 2009: IMAGENET “We believe that a large-scale ontology of images is a critical resource for developing advanced, large-scale content-based image search and image understanding algorithms, as well as for providing critical training and benchmarking data for such algorithms.” (Fei Fei Li et al CVPR09) !
  • 19. • 80K hierarchical categories • 80M images of size >100x100 • labeled by 50K Amazon Turks 19 2009: IMAGENET
  • 22. • dropout, ReLU, max-pooling, data augmentation, batch normalization, automatic differentiation, end-to-end training, lots of layers • Krizhevsky, Sutskever, Hinton (2012): 1.2M images, 60M parameters, 6 days training on two GPUs 22 TECHNIQUES & TRICKS
  • 24. 24 SECOND TAKE-HOME MESSAGE To make deep learning shine, you need huge labeled data sets and time to train
  • 25. • Imagenet (80M >100x100 color images, 80K classes) • FaceBook (300M photos/day) • Google (300h of video/minute) 25 SECOND TAKE-HOME MESSAGE To make deep learning shine, you need huge labeled data sets and time to train
  • 26. • Theano • TensorFlow • Keras • Caffe • Torch 26 TODAY: EASY-TO-USE LIBRARIES
  • 32. 32