AIXMOOC 2.2 - Reti neurali e machine learning (Valerio Freschi)

AIXMOOC 2.2 valerio freschi
L’ESPLOSIONE
valerio freschi
#AIXMOOC
2.2 RETI NEURALI E MACHINE LEARNING
mooc.uniurb.it/aixmooc
DELL’INTELLIGENZA
ARTIFICIALE

Introduzione
• Un modello di apprendimento
supervisionato definisce un mapping
da uno o più input a più output
• Per esempio, l’input può essere l’età e
il numero di chilometri percorsi di un
auto usata e l’output potrebbe essere
la stima del suo valore in €
• Il modello è un’equazione matematica;
quando gli input vengono inseriti
nell’equazione, questa permette di
calcolare l’output, un processo noto
come inferenza
2 1 Introduction
Figure 1.1 Machine learning is an area
of artificial intelligence that fits math-
ematical models to observed data. It
can coarsely be divided into supervised
learning, unsupervised learning, and re-
inforcement learning. Deep neural net-
works contribute to each of these areas.
1.1.1 Regression and classification problems
Figure 1.2 depicts several regression and classification problems. In each case, there is a
S.J.D. Prince, Understanding Deep Learning

Modelli e parametri
• L’equazione modello contiene anche
dei parametri
• Differenti valori dei parametri
modificano l’esito della
computazione (abbiamo una famiglia
di possibili relazioni input/output)
• I parametri specificano una
particolare relazione
• y = f(x, W)
2 1 Introduction
Figure 1.1 Machine learning is an area
of artificial intelligence that fits math-
ematical models to observed data. It
can coarsely be divided into supervised
learning, unsupervised learning, and re-
inforcement learning. Deep neural net-
works contribute to each of these areas.
1.1.1 Regression and classification problems
Figure 1.2 depicts several regression and classification problems. In each case, there is a

DL: la storia in breve
• 1943: Modello del neurone
• 1952: Stochastic Gradient Descent
• 1958: Perceptron (weight learning)
• 1986: Backpropagation
• 1998: Deep convolutional networks
• 2012: Imagenet (AlexNet)
• 2014: Adversarial examples
• 2015 Deep Reinforcement Learning
• 2017-18: Transformers
• 2020: GPT3
• 2023: GPT4
1.1 Supervised learning 3
Figure 1.2 Regression and classification problems. a) This regression model takes
a vector of numbers that characterize a property and predicts its price. b) This

Reti neurali: l’unità fondamentale
• Neurone (Perceptron)
x1
x2
xN
1
Σ a( ⋅ ) ̂
y
w0
w1
w2
wN
Input Pesi Somma
A
tti
vazione
non lineare
Output
̂
y = a(w0 +
N
∑
i=1
xiwi)
Output
Bias
Combinazione lineare degli input
5
A
tti
vazione
non lineare

• Neurone (Perceptron)
Input Pesi Somma Output
̂
y = a(w0 +
N
∑
i=1
xiwi)
̂
y = a(w0 + XT
⋅ W)
x1
x2
xN
1
Σ a( ⋅ ) ̂
y
w0
w1
w2
wN
A
tti
vazione
non lineare
6

• Ad esempio, in questo caso
(neurone con 3 input,
funzione di attivazione
sigmoidale):
Input Pesi Somma Output
̂
y = a(w0 +
N
∑
i=1
xiwi)
x1
x2
x3
1
Σ a( ⋅ ) ̂
y
w0 = 0.5
w1 = 0.3
w2 = − 1.3
w3 = 1.2
A
tti
vazione
non lineare
7
x1 = 3
x2 = 5
x3 = − 2
z = − 7.5
z = w0 + w1 ⋅ x1 + w2 ⋅ x2 + w3 ⋅ x3 =
= 0.5 + 0.3 ⋅ 3 + (−1.3) ⋅ 5 + (−2) ⋅ 1.2 = − 7.5
̂
y =
1
1 + e−z
=
1
1 + e7.5
= 0.00055
a(z) =
1
1 + e−z
̂
y = 0.00055

Neurone artificiale e plausibilità biologica
• I pesi sono analoghi ai
dendriti
• La somma pesata dell’input è
confrontata con una soglia e,
se l’attivazione è superiore
alla soglia il neurone si
“accende”
• Questo è l’analogo
dell’emissione di un output
elettrico (potenziale d’azione)
• Modello del neurone di
McCulloch-Pitts (1943)
434 Chapter 13. Neural Networks for Tabular Data
Figure 13.8: Illustration of two neurons connected together in a “circuit”. The output axon of the left neuron
makes a synaptic connection with the dendrites of the cell on the right. Electrical charges, in the form of ion
flows, allow the cells to communicate. From https: // en. wikipedia. org/ wiki/ Neuron . Used with kind
permission of Wikipedia author BruceBlaus.
of these libraries in various places throughout the book to implement a variety of models, not just
DNNs.4
More details on the history of the “deep learning revolution” can be found in e.g., [Sej18; Met21].
13.2.7 Connections with biology
In this section, we discuss the connections between the kinds of neural networks we have discussed
above, known as artificial neural networks or ANNs, and real neural networks. The details on
how real biological brains work are quite complex (see e.g., [Kan+12]), but we can give a simple
“cartoon”.
We start by considering a model of a single neuron. To a first approximation, we can say that
Fully connected neural networks
Biological plausibility of neural networks
Fully connected neural networks
Biological plausibility of neural networks
Valerio Freschi (UniUrb) ML - Neural Networks 8
Borhani et al., Machine Learning Re
fi
ned
h
tt
ps://en.wikipedia.org/wiki/Neuron

Deep learning
• Se mettiamo più unità
neuronali tra input e
output e,
eventualmente,
costruiamo strati con
più unità?
• Otteniamo una rete
neurale, eventualmente
profonda
Fully connected (dense) neural networks
Figure from CS231n course lecture notes - Stanford
Valerio Freschi (UniUrb) ML - Neural Networks 2 / 23
9

Deep learning
• In generale le reti neurali
sono costruite
ricorsivamente a partire dal
seguente pattern: layer
lineare, funzione di
attivazione, layer lineare,
funzione di attivazione,...
• Le reti neurali profonde
(deep networks) sono reti
neurali che replicano il
pattern sopra descritto più
volte
The activation function g could be the threshold function like in Eqn. 1.2, bu
generally it can be any pointwise nonlinearity, that is, g(h) = [g̃(h1), . . . , g̃(hN )] and
nonlinear function mapping R ! R.
Beyond MLPs, this kind of sequence – linear layer, pointwise nonlinearity, linea
pointwise nonlinearity, and so on – is the prototpyical motif in almost all neural ne
including most we will see later in this book.
1.3 Deep nets
Deep nets are neural nets that stack the above motif many times:
heron
Linear
Non-linearity
…
10
6.S898 MIT Deep Learning

Loss function
• La loss function di una
rete misura il costo
delle predizioni errate
x1
x2
x3
z2 ̂
y1
z3
z1
x = [3,5, − 2]
Predetto: 0.1
Reale: 1
ℒ(f(x(i)
; W), y(i)
)
Predetto Reale
a(w0 +
N
∑
i=1
xiwi)

Loss function
• La loss empirica misura la
loss totale sul dataset
x1
x2
x3
z2 ̂
y1
z3
z1
x = [3,5, − 2]
Predetto: 0.1
Reale: 1
J(W) =
1
N
N
∑
i=1
ℒ(f(x(i)
; W), yi
)
Predetto Reale

Loss function
• Esempio: la mean squared
error loss
• Utilizzata con modelli di
regressione (che
producono in output un
valore continuo)
x1
x2
x3
z2 ̂
y1
z3
z1
x = [3,5, − 2]
Predetto: 0.1
Reale: 1
J(W) =
1
N
N
∑
i=1
(y(i)
− f(x(i)
; W))2
Predetto
Reale

Addestrare una rete neurale
• La loss function dipende da
un insieme di parametri
(pesi)
• L’obiettivo è trovare i pesi
della rete che
corrispondono alla loss più
bassa
•
• Quante variabili contiene
?
W
W* = argminW
1
N
N
∑
i=1
ℒ(f(x(i)
, W), y(i)
)
W* = argminWJ(W)
W
x1
x2
x3
z2 ̂
y1
z3
z1
x = [3,5, − 2]
Predicted: 0.1
Actual: 1
w(1)
11
w(1)
12
w(1)
13
w(1)
21
w(1)
22
w(1)
23
w(1)
33
w(1)
31
w(1)
32
w(2)
11
w(2)
21
w(2)
31
3+3+3+1+1+1+3+1 = 12+4 = 16
W = {w(1)
11
, w(1)
12
, w(1)
13
, w(1)
21
, w(1)
22
, w(1)
23
, w(1)
31
, w(1)
32
, w(1)
33
,
b(1)
1
b(1)
2
b(1)
3
b(2)
1
b(1)
1
, b(1)
2
, b(1)
3
, w(2)
11
, w(2)
21
, w(2)
31
, b(2)
1
}

Ottimizzazione
• La loss function dipende da
un insieme di parametri (pesi)
• L’obiettivo è trovare i pesi
della rete che corrispondono
alla loss più bassa
• Supponiamo che consista
in una sola variabile (un unico
peso da determinare)
W
W
y = − 2x + ex
− 5

Ottimizzazione e derivate
y = − 2x + ex
− 5 y′

(x*) = 0
• Funzione reale di variabile reale
• La derivata esprime la sensibilità del
valore della funzione rispetto ad un
cambiamento della variabile
indipendente (del suo argomento)
• E’ quindi il tasso di variazione
istantaneo
f : ℝ → ℝ
y′

(xa) = − 1.86 y′

(xb) = 18.09
xa x*
xb

Ottimizzazione e derivate
J

Derivate
• Funzione reale di
variabile reale
f : ℝ → ℝ
TEORIA
T
Derivate fondamentali
• Potenze di x
y k y 0
"
= =
l
y y
x 1
"
= =
l
y x y x 1
" a
= =
a a-
l , { }, x
0
N R
! !
a - ,
o R
!
a e x 0
2 .
› y x2
=
y x x
2
2 1
2
= =
-
l
y x y
x
2
1
"
= =
l , x 0
2 .
• Funzioni logaritmiche ed esponenziali
,
ln
y a y a a
x x
"
= =
l a 0
2 .
y e y e
x x
"
= =
l
, , .
log log
y x y x e x a a
1
0 0 1
a a
" /
2 2 !
= =
l
,
ln
y x y x
1
"
= =
l x 0
2 .
• Funzioni goniometriche
sin cos
y x y x
"
= =
l
cos sin
y x y x
"
= =-
l
tan
cos
tan
y x y
x
x
1
1
2
2
"
= = = +
l
( )
cot
sin
cot
y x y
x
x
1
1
– –
2
2
"
= = = +
l
• Funzioni goniometriche inverse
arcsin
y x y
x
1
1
2
"
= =
-
l
arccos
y x y
x
1
1
2
"
= =-
-
l
arctan
y x y
x
1
1
2
"
= =
+
l
cot
y x y
x
1
1
arc 2
"
= =-
+
l
Calcolo delle derivate
• Prodotto di una costante per una
funzione
( ) ( )
y k f x y k f x
"
$ $
= =
l l
› y x y
4 4
"
= =
l
• Somma di funzioni
( ) ( ) ( ) ( )
y f x g x y f x g x
"
= + = +
l l l
› 3 3
x
x x
x x
y y
2 5
2 8
5 4
4 3
"
= + =
+ +
- -
l
• Prodotto di funzioni
( ) ( ) ( ) ( ) ( ) ( )
y f x g x y f x g x f x g x
"
$ $ $
= = +
l l l
› sin
x x
y 3
=
sin cos
x x
x x
y 3 2 3
$ $
= +
l
• Reciproco di una funzione
( ) ( )
( )
y
f x
y
f x
f x
1
– 2
"
= =
l
l
› ( )
x x
y y
1 2
2 3 2 3 2
"
= =
+ +
-
l ,
con x
2
3
!-
• Quoziente di due funzioni
( )
( )
( )
( ) ( ) ( ) ( )
y
g x
f x
y
g x
f x g x f x g x
–
2
"
$ $
= =
l
l l
› x
x
y
1
2
=
+
, con x 0
!
( )
( )
x
x
x x
y
x
x 2
1 1 2
2 3
2
2
$ $
=
-
=-
+
+
l
Funzione composte
( ( )) ( ( )) ( )
y f g x y f g x g x
" $
= =
l l l
› ( )
x
y 8
3 7
+
=
f(g(x))
g(x)
Mappa dei fondamentali
TEORIA
T
• Potenze di x
y k y 0
"
= =
l
y y
x 1
"
= =
l
y x y x 1
" a
= =
a a-
l , { }, x
0
N R
! !
a - ,
o R
!
a e x 0
2 .
› y x2
=
y x x
2
2 1
2
= =
-
l
y x y
x
2
1
"
= =
l , x 0
2 .
,
ln
y a y a a
x x
"
= =
l a 0
2 .
y e y e
x x
"
= =
l
, , .
log log
y x y x e x a a
1
0 0 1
a a
" /
2 2 !
= =
l
,
ln
y x y x
1
"
= =
l x 0
2 .
sin cos
y x y x
"
= =
l
cos sin
y x y x
"
= =-
l
tan
cos
tan
y x y
x
x
1
1
2
2
"
= = = +
l
( )
cot
sin
cot
y x y
x
x
1
1
– –
2
2
"
= = = +
l
arcsin
y x y
x
1
1
2
"
= =
-
l
arccos
y x y
x
1
1
2
"
= =-
-
l
arctan
y x y
x
1
1
2
"
= =
+
l
cot
y x y
x
1
1
arc 2
"
= =-
+
l
funzione
( ) ( )
y k f x y k f x
"
$ $
= =
l l
› y x y
4 4
"
= =
l
( ) ( ) ( ) ( )
y f x g x y f x g x
"
= + = +
l l l
› 3 3
x
x x
x x
y y
2 5
2 8
5 4
4 3
"
= + =
+ +
- -
l
( ) ( ) ( ) ( ) ( ) ( )
"
$ $ $
= = +
l l l
› sin
x x
y 3
=
sin cos
x x
x x
y 3 2 3
$ $
= +
l
( ) ( )
( )
y
f x
y
f x
f x
1
– 2
"
= =
l
l
› ( )
x x
y y
1 2
2 3 2 3 2
"
= =
+ +
-
l ,
con x
2
3
!-
( )
( )
( )
( ) ( ) ( ) ( )
y
g x
f x
y
g x
f x g x f x g x
–
2
"
$ $
= =
l
l l
› x
x
y
1
2
=
+
, con x 0
!
( )
( )
x
x
x x
y
x
x 2
1 1 2
2 3
2
2
$ $
=
-
=-
+
+
l
Funzione composte
( ( )) ( ( )) ( )
y f g x y f g x g x
" $
= =
l l l
› ( )
x
y 8
3 7
+
=
( )
x x
y 8 3
7 3 2
6
$
= +
l
f(g(x))
g(x)
Mappa dei fondamentali
TEORIA
T
• Potenze di x
y k y 0
"
= =
l
y y
x 1
"
= =
l
y x y x 1
" a
= =
a a-
l , { }, x
0
N R
! !
a - ,
o R
!
a e x 0
2 .
› y x2
=
y x x
2
2 1
2
= =
-
l
y x y
x
2
1
"
= =
l , x 0
2 .
,
ln
y a y a a
x x
"
= =
l a 0
2 .
y e y e
x x
"
= =
l
, , .
log log
y x y x e x a a
1
0 0 1
a a
" /
2 2 !
= =
l
,
ln
y x y x
1
"
= =
l x 0
2 .
sin cos
y x y x
"
= =
l
cos sin
y x y x
"
= =-
l
tan
cos
tan
y x y
x
x
1
1
2
2
"
= = = +
l
( )
cot
sin
cot
y x y
x
x
1
1
– –
2
2
"
= = = +
l
arcsin
y x y
x
1
1
2
"
= =
-
l
arccos
y x y
x
1
1
2
"
= =-
-
l
arctan
y x y
x
1
1
2
"
= =
+
l
cot
y x y
x
1
1
arc 2
"
= =-
+
l
funzione
( ) ( )
y k f x y k f x
"
$ $
= =
l l
› y x y
4 4
"
= =
l
( ) ( ) ( ) ( )
y f x g x y f x g x
"
= + = +
l l l
› 3 3
x
x x
x x
y y
2 5
2 8
5 4
4 3
"
= + =
+ +
- -
l
( ) ( ) ( ) ( ) ( ) ( )
"
$ $ $
= = +
l l l
› sin
x x
y 3
=
sin cos
x x
x x
y 3 2 3
$ $
= +
l
( ) ( )
( )
y
f x
y
f x
f x
1
– 2
"
= =
l
l
› ( )
x x
y y
1 2
2 3 2 3 2
"
= =
+ +
-
l ,
con x
2
3
!-
( )
( )
( )
( ) ( ) ( ) ( )
y
g x
f x
y
g x
f x g x f x g x
–
2
"
$ $
= =
l
l l
› x
x
y
1
2
=
+
, con x 0
!
( )
( )
x
x
x x
y
x
x 2
1 1 2
2 3
2
2
$ $
=
-
=-
+
+
l
Funzione composte
( ( )) ( ( )) ( )
y f g x y f g x g x
" $
= =
l l l
› ( )
x
y 8
3 7
+
=
( )
x x
y 8 3
7 3 2
6
$
= +
l
f(g(x))
g(x)
f l(g(x)) gl(x)
h
tt
ps://online.scuola.zanichelli.it/

Funzioni di più variabili
• Funzione reale di più variabili reali
• La derivata parziale è la derivata rispetto ad una variabile
indipendente quando le altre vengono mantenute costanti
f : ℝN
→ ℝ

Funzioni di più variabili e gradiente
• Funzione reale di più variabili reali
• Il gradiente è il vettore di tutte le derivate parziali
• Identifica la direzione di maggiore tasso di variazione, la direzione di
massima ascesa (quindi quella di massima discesa)
f : ℝN
→ ℝ
−∇f

Algoritmo di discesa del gradiente
• Scegli un valore iniziale (e.g.
casualmente )
• Calcola il gradiente
• Fai un piccolo passo nella
direzione opposta al gradiente
• Ripeti fino a convergenza
∼
𝒩
(0,σ2
)
∂J(W)
∂W
J(W)
6.S898 MIT Deep Learning
W* = argminWJ(W)
2
1
0
-1
x
-2
Peaks
-3
-3
-2
y
-1
0
1
2
J(✓)
<latexit sha1_base64="a/Lk8rEB5YqTpm5iqH/sBpa7bWw=">AAACiHicfVHbahRBEO0db3G8JfqYl8FFiCLLjAjRtxB9kIAkgewmsj2Emt6a2SZ9Gbpr1GWYr/BVP8y/sWezgdywoOH0qVNddbqKWklPafp3EN25e+/+g7WH8aPHT54+W994PvG2cQLHwirrTgrwqKTBMUlSeFI7BF0oPC7OPvX54+/ovLTmiBY15hoqI0spgAL1bW+L0xwJXp+uD9NRuozkJshWYMhWcXC6MZB8ZkWj0ZBQ4P00S2vKW3AkhcIu5o3HGsQZVDgN0IBGn7fLibvkVWBmSWldOIaSJXu5ogXt/UIXQamB5v56ridvy00bKj/krTR1Q2jEeaOyUQnZpLefzKRDQWoRAAgnw6yJmIMDQeGT4ittikJ3gfmMwZ7Dr6HVfo0OyLo3LQdXafjZBbsVf9uj/wmluRAGFJ40+ENYrcHMWm6s0900y1uusCSuJuhomHEnqzlx19+6OCwnu76Km2DybpSlo+zw/XBnd7WmNbbJXrItlrFttsO+sAM2ZoJp9ov9Zn+iOEqj7ejjuTQarGpesCsR7f4DaXjHOg==</latexit>
✓1
<latexit sha1_base64="dRSr4rXm+n73Ts3IpBaIU3XH5Gg=">AAACh3icfVHbbhMxEHWWS9twa+GRlxUREkIorBGifSyXB14QRSJppHgVzTqzG6u+rOxZIFrtT/AKP8bf4E2DRFvESJaOz5zxzPEUtVaBsuzXILl2/cbNnd294a3bd+7e2z+4Pw2u8RIn0mnnZwUE1MrihBRpnNUewRQaT4uzt33+9Av6oJz9TOsacwOVVaWSQJGaCVohwYIv9kfZONtEehXwLRixbZwsDgZKLJ1sDFqSGkKY86ymvAVPSmrshqIJWIM8gwrnEVowGPJ2M3CXPo7MMi2dj8dSumH/rmjBhLA2RVQaoFW4nOvJf+XmDZVHeats3RBaed6obHRKLu3dp0vlUZJeRwDSqzhrKlfgQVL8o+GFNkVhusi8w2jP44fY6mONHsj5p60AXxn41kW7lXjWo/8Jlf0jjCg+afGrdMaAXbbCOm+6Oc9bobEkoafoacSFV9WKhO9v3TAuh19exVUwfTHm2Zh/ejk6frNd0y57yB6xJ4yzQ3bM3rMTNmGSafad/WA/k73kefIqOTqXJoNtzQN2IZLXvwEsI8cl</latexit>
✓2
<latexit sha1_base64="NpiJT5/KEODczN77QRTuqpv3m44=">AAACh3icfVFNb9NAEN2YFtpQaAvHXqxGSFWFgl0h6LF8HLhUFImkkbJWNN6MnVX3w9odA5HlP8EV/hj/hnUaJNoiRlrp7Zs3O/N28kpJT0nyqxfd29i8/2Bru/9w59Hj3b39J2NvaydwJKyybpKDRyUNjkiSwknlEHSu8DK/etflL7+g89Kaz7SsMNNQGllIARSoCacFEsxOZnuDZJisIr4L0jUYsHVczPZ7ks+tqDUaEgq8n6ZJRVkDjqRQ2PZ57bECcQUlTgM0oNFnzWrgNn4WmHlcWBeOoXjF/l3RgPZ+qfOg1EALfzvXkf/KTWsqTrNGmqomNOK6UVGrmGzcuY/n0qEgtQwAhJNh1lgswIGg8Ef9G23yXLeBeY/BnsPz0OpjhQ7IuuOGgys1fGuD3ZI/79D/hNL8EQYUnjT4VVitwcwbbqzT7TTNGq6wIK7G6GiQcifLBXHX3dp+WE56exV3wfhkmCbD9NPLwdnb9Zq22AE7ZEcsZa/ZGfvALtiICabYd/aD/Yy2oxfRq+j0Whr11jVP2Y2I3vwGLj/HJg==</latexit>
✓⇤
= arg min J(✓)
x
Gradient descent
W1 W2

Backpropagation
• I layer possono essere pensati come blocchi modulari
concatenati insieme in un grafo computazionale
• Ogni layer riceve degli input e li trasforma in output
tramite un’operazione attraverso il layer detta
forward pass
• Ricordiamo che il problema dell’apprendimento è
trovare i parametri che permettono di ottenere un
mapping i/o desiderato (attraverso la minimizzazione
di una funzione di costo/loss)
• Il problema è solitamente risolto tramite discesa del
gradiente
W
xout = f(xin, W)
i
to be an input to a parameter-free transformation:
xout = f(xin, ✓)
Graphically, we will depict the forward operation of a layer like s
f(xin, ✓)
✓
xin xout
forward
The learning problem is to find the parameters ✓ that achieve a
we will solve this problem via gradient descent. The question of t
compute the gradients?
W
f(xin, W)

Backpropagation
• Come calcolare i gradienti?
• L’algoritmo di Backpropagation permette di
calcolare in modo efficiente il gradiente della
funzione di loss rispetto a ogni parametro del
grafo computazionale
• Viene utilizzata un’operazione speciale
chiamata backward che, come l’operazione
forward, può essere definita per ogni layer
xout = f(xin, W)
i
to be an input to a parameter-free transformation:
xout = f(xin, ✓)
Graphically, we will depict the forward operation of a layer like s
f(xin, ✓)
✓
xin xout
forward
The learning problem is to find the parameters ✓ that achieve a
we will solve this problem via gradient descent. The question of t
compute the gradients?
W
f(xin, W)

Backpropagation
• Un esempio: un singolo neurone
h
tt
ps://cs231n.stanford.edu/
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
∂f
∂f
= 1
a
b
c
d h g k f
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
∂f
∂k
= −
1
k2
∂f
∂f
= − 0.53 ⋅ 1
a
b
c
d h g k f
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂g
=
∂f
∂k
∂k
∂g
= − 0.53 ⋅ 1
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂h
=
∂f
∂g
∂g
∂h
= − 0.53 ⋅ e−1
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂d
=
∂f
∂h
∂h
∂d
= − 0.20 ⋅ (−1)
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂c
=
∂f
∂d
∂d
∂c
= 0.20 ⋅ 1
∂f
∂w2
=
∂f
∂d
∂d
∂w2
= 0.20 ⋅ 1
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂a
=
∂f
∂c
∂c
∂a
= 0.20 ⋅ 1
∂f
∂b
=
∂f
∂c
∂c
∂b
= 0.20 ⋅ 1
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂w0
=
∂f
∂a
∂a
∂w0
= 0.20 ⋅ (−1)
∂f
∂x0
=
∂f
∂a
∂a
∂x0
= 0.20 ⋅ 2
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∂f
∂w1
=
∂f
∂b
∂b
∂w1
= 0.20 ⋅ (−2)
∂f
∂x1
=
∂f
∂b
∂b
∂x1
= 0.20 ⋅ (−3)
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

Backpropagation
a
b
c
d h g k f
∇f = [−0.2; − 0.39; 0.2]
f(w, x) =
1
1 + e−(w0x0+w1x1+w2)

#AIXMOOC
GRAZIE
mooc.uniurb.it/aixmooc

AIXMOOC 2.2 - Reti neurali e machine learning (Valerio Freschi)

More Related Content

Similar to AIXMOOC 2.2 - Reti neurali e machine learning (Valerio Freschi)

More from Alessandro Bogliolo

AIXMOOC 2.2 - Reti neurali e machine learning (Valerio Freschi)