Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Output Units and Cost Function in FNN
1. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Deep Neural Network
Cost Functions and Output Units
Jiaming Lin
jmlin@arbor.ee.ntu.edu.tw
DATALab@III
NetDBLab@NTU
January 9, 2017
1 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
2. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
2 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
3. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Introduction
In the neural network learning...
The selection of output unit depends on the learning
problems.
– Classification: sigmoid, softmax or linear.
– Linear Regression: linear.
3 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
4. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Introduction
In the neural network learning...
The selection of output unit depends on the learning
problems.
– Classification: sigmoid, softmax or linear.
– Linear Regression: linear.
Determine and analyse the cost function.
– Is the cost function †analytic?
– Can the learning progress well(first order derivative)?
3 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
5. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Introduction
In the neural network learning...
The selection of output unit depends on the learning
problems.
– Classification: sigmoid, softmax or linear.
– Linear Regression: linear.
Determine and analyse the cost function.
– Is the cost function †analytic?
– Can the learning progress well(first order derivative)?
Deterministic and Generic Model.
– Data is more complicated in many cases.
Note: †For simplicity, we mean analytic to say a function is
infinitely differentiable on the domain.
3 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
6. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
4 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
7. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
5 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
8. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Binary
index x1 · · · xn target
1 0 · · · 1 Class A
2 1 · · · 0 Class B
3 1 · · · 1 Class A
· · · · · · · · · · · · · · ·
m 0 · · · 0 Class B
6 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
9. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Binary
where
S is the sigmoid function,
z is the input of output layer
z = w h + b (1)
with w is weight, h is output of hidden layer and b is bias.
6 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
10. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Cost Function
Cost function can be derived from many methods, we discuss
two of the most common
Mean Square Error
Cross Entropy
7 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
11. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Cost Function
Cost function can be derived from many methods, we discuss
two of the most common
Mean Square Error
Let y(i)
denotes the data label, and ˆy(i)
= S(z(i)
) as the
prediction. We may define the cost function Cmse by
Cmse =
1
m
m
i=1
(ˆy(i)
− y(i)
)2
(2)
where m is the data size, and z(i)
, ˆy(i)
and y(i)
are real
numbers.
7 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
12. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Cost Function
Cost function can be derived from many methods, we discuss
two of the most common
Cross Entropy
Adapting the symbols above, the cost function defined by
Cross Entropy is
Cce =
1
m
m
i=1
y(i)
ln(ˆy(i)
) + (1 − y(i)
) ln(1 − ˆy(i)
) (2)
where m is the data size, and z(i)
, ˆy(i)
and y(i)
are real
numbers.
7 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
13. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
Problem: Which one is better?
Analyticity(infinitely differentiable)
Learning ability(first order derivatives)
8 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
14. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
Analyticity:
Cmse =
1
m
m
i=1
(ˆy(i)
− y(i)
)2
Cce =
1
m
m
i=1
y(i)
ln(ˆy(i)
) + (1 − y(i)
) ln(1 − ˆy(i)
)
Computationally, the value of ˆy(i)
= S(z(i)
) could overflow to
1 or underflow to 0 when z(i)
is very positive or very negative.
Therefore, given a fixed y(i)
∈ {0, 1},
Cce is undefined at ˆy(i)
is 0 or 1.
Cmse is polynomial and thus analytic every where.
8 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
15. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
Learning Ability: compare the gradients
∂Cmse
∂w
= [S(z) − y] [1 − S(z)] S(z)h, (3)
∂Cce
∂w
= [y − S(z)] h (4)
respectively, where S is sigmoid, z = w h + b.
8 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
16. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
MSE Cross Entropy
[S(z) − y] [1 − S(z)] S(z)h [y − S(z)] h
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 0
If y = 0 and ˆy → 1,
steps → 0
If y = 0 and ˆy → 0,
steps → 0
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 1
If y = 0 and ˆy → 1,
steps → −1
If y = 0 and ˆy → 0,
steps → 0
9 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
17. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
MSE Cross Entropy
[S(z) − y] [1 − S(z)] S(z)h [y − S(z)] h
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 0
If y = 0 and ˆy → 1,
steps → 0
If y = 0 and ˆy → 0,
steps → 0
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 1
If y = 0 and ˆy → 1,
steps → −1
If y = 0 and ˆy → 0,
steps → 0
In the ceas of Mean Square Error, the progress get stuck when
z is very positive or very negative.
9 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
18. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
We have mentioned about the unstable issue of cross
entropy.
10 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
19. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
We have mentioned about the unstable issue of cross
entropy.
Precisely,
ˆy = S(z) underflow to 0 when z is very negative,
ˆy = S(z) overflow to 1 when z is very positive.
Therefore, given a fixed y ∈ {0, 1}, then the function
C = y ln ˆy + (1 − y) ln(1 − ˆy)
could be undefined when z is very positive or very
negative.
10 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
20. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
Alternatively, regarding z as the variable of cross entropy
C = y ln S(z) + (1 − y) ln(1 − S(z)) (5)
= −ζ(−z) + z(y − 1), (6)
where ζ is the softplus and z is real number.
11 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
21. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
Alternatively, regarding z as the variable of cross entropy
C = y ln S(z) + (1 − y) ln(1 − S(z)) (5)
= −ζ(−z) + z(y − 1), (6)
where ζ is the softplus and z is real number.
We may obtain the analyticity of C by showing the dC
dz
is
multiple of analytic functions.
11 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
22. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
Alternatively, regarding z as the variable of cross entropy
C = y ln S(z) + (1 − y) ln(1 − S(z)) (5)
= −ζ(−z) + z(y − 1), (6)
where ζ is the softplus and z is real number.
In the cases of right answer
y = 1 and ˆy = S(z) → 1 ⇒ z → ∞, C → 0,
y = 0 and ˆy = S(z) → 0 ⇒ z → −∞, C → 0.
In the cases of wrong answer
y = 1 and ˆy = S(z) → 0 ⇒ z → −∞, C → −1,
y = 0 and ˆy = S(z) → 1 ⇒ z → ∞, C → −1.
11 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
23. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
12 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
24. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Multinoulli: Output Unit and Cost Function
Generalize the binary case to multiple classes.
Linear output units and #(output units) = #(classes).
Cost function evaluated by cross entropy.
13 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
25. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Multinoulli: Output Unit and Cost Function
Generalize the binary case to multiple classes.
Linear output units and #(output units) = #(classes).
Cost function evaluated by cross entropy.
Cost Function in Multinoulli Problems
Suppose the size of dataset is m and there are K classes, then
we can obtain the cost function from cross entropy
C(w) = −
m
i=1
K
k=1
1{y(i)
= k} ln
exp(z
(i)
k )
K
j=1 exp(z
(i)
j )
(7)
where z
(i)
k = wk h(i)
+ bk and h(i)
is the output of hidden layer
corresponding to example data xi.
13 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
26. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
A Lemma for Cost Function Simplify
Analyticity(infinitely differentiable)
Learning ability(first order derivatives)
14 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
27. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
A Lemma for Cost Function Simplify
Analyticity(infinitely differentiable)
Learning ability(first order derivatives)
To claim above properties, We should show a lemma at very
first,
Lemma 1
For the output z = w h + b and z = [z1, . . . , zK], we have
sup
z
ln
K
j=1
exp(zj) = max
j
{zj}. (8)
14 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
28. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
A Lemma for Cost Function Simplify
Proof.
Without loss of generality, we may assume z1 > . . . > zK,
then the remaining work is to show, for all > 0.
ln ez1
1 +
K
j=2
ezj−z1
= z1 + ln 1 +
K
j=2
ezj−z1
≤ z1 +
Intuitively, the ln
K
j=1
exp (zj) can be well approximated
by max
j
{zj}.
14 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
29. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Analyticity
We may rewrite the cost function as
C(w) = −
m
i=1
K
k=1
1{y(i)
= k} z
(i)
k − ln
K
j=1
exp(z
(i)
j ) .
For each summand, it is substraction of analytic function and
thus analytic, and the term 1{y(i)
= k} is acturally a constant.
The total cost is summation of analytic functions and thus
analytic.
15 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
30. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Property 2
By the rule of sum in derivatives, we may simplify the (7) as
following
C(i)
=
K
k=1
1{y = k} zk − ln
K
j=1
exp(zj) , (8)
this cost is contributed by the example xi in the total cost C.
1 Assume the model gives the right answer, then the
errors would close to 0.
2 Assume the model gives the wrong answer, then the
learning can prograss well.
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
31. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Proof (The Right Answer).
Suppose the true label is class n. By the assumption, we
know zn is the maxmal. Then
− ≤
K
k=1
1{y = k} zk − ln
K
j=1
exp(zj)
= zn − ln
K
j=1
exp(zj)
< zn − max
j
{zj} = 0.
This shows that − ≤ C(i)
< 0 for an arbitrary small .
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
32. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Proof (The Wrong Answer).
Suppose the true label is class n. By assumption, the
prediction zn given by model is not the maxmal. On the other
hand, using the fact
zn = max
j
{zj} ⇒ softmax(zn) 1.
This implies that there exist a sufficient large δ > 0 such that
| softmax(zn) − 1 |> δ.
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
33. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Proof (The Wrong Answer, Conti.)
Then
∂C(i)
∂zn
=
∂
∂zn
zn − ln
K
j=1
ezj
= 1 − softmax(zn)
> δ
This shows the gradient is sufficently large and also
predictable(bounded by 1), therefore the learning can progress
well.
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
34. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
17 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
35. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Learning Processes Overview
Deterministic Generic
Step1 Model function
Linear
Sigmoid
Probability distribution
Gaussian
Bernoulli
Step2 Design errors evals
MSE
Cross Entropy
Maximum Likelihood Es-
timate
Step3 Learning one statistic
Mean
Median
Learning full distribution
18 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
36. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Learning Processes Overview
Deterministic Generic
Step1 Model function
Linear
Sigmoid
Probability distribution
Gaussian
Bernoulli
Step2 Design errors evals
MSE
Cross Entropy
Maximum Likelihood Es-
timate
Step3 Learning one statistic
Mean
Median
Learning full distribution
To describe some complicate data, it’s easier to build model
with generic method.
18 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
37. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Binary Classification
Step1: Using Bernoulli distribution as likelihood function.
p(y | x) = py
(1 − p)1−y
= S(z)y
(1 − S(z))1−y
Step2: Minimizing negative log-likelihood
ln p(y | x(i)
) = y ln S(z) + (1 − y) ln(1 − S(z))
Step3: We an learn the full distribution.
p(y | x ) = S(z )y
(1 − S(z ))1−y
,
where we denote z = w x + b and S is sigmoid.
19 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
38. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step1
Given a training feature x, using Gaussian distribution as
likelihood function
20 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
39. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step1
Given a training feature x, using Gaussian distribution as
likelihood function
p(y | x) =
1
√
2σ2π
exp
−(µ − y)2
2σ2
,
where we denote the output of hidden layer as hx, weight
w = [w1, w2] and bias b = [b1, b2], then
µ = w1 hx + b1
σ = w2 hx + b2
Intuitively, µ and σ are two linear output units, they are
functions of x.
20 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
40. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
Recall that the maximum likelihood estimate is equivalent to
minimize the negative log-likelihood, that is
(ˆµ, ˆσ) = arg min
(µ,σ)
−
x
ln p(y | x) (8)
21 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
41. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
Recall that the maximum likelihood estimate is equivalent to
minimize the negative log-likelihood, that is
(ˆµ, ˆσ) = arg min
(µ,σ)
−
x
ln p(y | x) (8)
However, for each summand,
Cx = ln p(y | x) =
−1
2
ln(2πσ2
) +
(µ − y)2
σ2
∂Cx
∂σ
= (πσ)−1
− 2σ−3
(µ − y)
the gradients and errors become unstable when σ close 0.
21 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
42. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
To prevent the gradients and errors from being unstable, we
may substitute the term 1
2σ2 with v, then for each summand in
the negative log-likelihood
Cx = ln π − ln v − (µ − y)2
v,
∂Cx
∂µ
= −2v(µ − y),
∂Cx
∂v
=
1
v
− (µ − y)2
.
Note that, this substitution valid only when the variance isn’t
too large.
22 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
43. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
If the variance σ is fixed and chosen by user, then by
comparing the negative log-likelihood and MSE, we can see
that minimizing NLL is equivalent to minimizing MSE.
Cmse =
1
m
m
i=1
ˆy(i)
− y(y) 2
Cnll =
m
i=1
Cx(i)
=
−1
2
m ln(2πσ2
) +
m
i=1
µx(i) − y(i) 2
σ2
22 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
44. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step3
Full distribution from Generic, µ and σ in this case.
Single statistics from Deterministic, µ in this case.
23 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
45. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step3
Full distribution from Generic, µ and σ in this case.
Single statistics from Deterministic, µ in this case.
Experiment(ref): generate random data base on the formula
y = x + 7.0 sin(0.75x) +
where is the gaussian noise with µ = 0, σ = 1
23 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
46. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step3
Full distribution from Generic, µ and σ in this case.
Single statistics from Deterministic, µ in this case.
FNN config:
#(hidden layey) = 1, width = 20 and hidden unit is tanh.
Gerneric Deterministic
23 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
47. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
More Complicated Cases
Complicated data distributions.
In some cases, it’s almost impossible to describe data via
deterministic methods.
Generic methods might perform better in complicated
case.
24 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
48. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Generate random data based on the formula
x = y + 7.0 sin(0.75y) +
where is the gaussian noise with µ = 0, σ = 1
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
49. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Firstly, just try to using MSE to define cost function and one
hidden layer with width = 20, hidden unit is tanh.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
50. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Firstly, just try to using MSE to define cost function and one
hidden layer with width = 20, hidden unit is tanh.
The reason is, minimizing MSE is
equivalant to minimizing nagetive log-likelihood for simple
Gaussian.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
51. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
The mixture density network. The Gaussian mixture with n
components is defined by the conditional probability
distribution
p(y | x) =
n
i=1
p(c = i|x)ℵ(y; µ(i)
(x); Σ(i)
(x)). (9)
Network configuration,
1 Number of components n, need to be fine tuned(try and
error).
2 3 × n output units.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
52. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Experiment(ref):
#(components) = 24,
two hidden layers with width = 24 and activation is tanh,
#(output units) = 3 × 24 and they are linear.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
53. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
26 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
54. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
55. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
An cross entropy improvement to avoid numerically
unstable.
– The MNIST example from Tensorflow.
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
56. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
An cross entropy improvement to avoid numerically
unstable.
– The MNIST example from Tensorflow.
Determine the cost function is good or not.
– Is the cost function analytic?
– Can the learning progress well?
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
57. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
An cross entropy improvement to avoid numerically
unstable.
– The MNIST example from Tensorflow.
Determine the cost function is good or not.
– Is the cost function analytic?
– Can the learning progress well?
Deterministic v.s. Generic
– Deterministic learns single statistic while generic learn
full distribution.
– When data distribution is not normal(high kurtosis or fat
tail), generic might be better.
– Generic methods is easier to apply to complicated cases.
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
58. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Thank you.
28 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network