28. Assumption
If we assume f , f , f is linear function.
We can say a^1 = f^1(w^1p+b^1) becomes also linear function
since (w^1p+b^1) is a linear function. after applying f , each data
points just move within linear space.
1 2 3
1
28
29. Proof
Let's replace f^1, f^2, f^3 = p which means it's a identity
function and to make calculation simpler.
a = f (w f (w f (w p + b ) + b ) + b )
a = f (w f (w (w p + b ) + b ) + b ) since f = p
a = f (w f (w w p + w b + b ) + b )
a = f (w (w w p + w b + b ) + b ) since f = p
a = f (w w w p + w w b + w b + b )
a = w w w p + w w b + w b + b since f = p
a = w w w p + w w b + w b + b
3 3 3 2 2 1 1 1 2 3
3 3 3 2 2 1 1 2 3 1
3 3 3 2 2 1 2 1 2 3
3 3 3 2 1 2 1 2 3 2
3 3 3 2 1 3 2 1 3 2 3
3 3 2 1 3 2 1 3 2 3 3
3 3 2 1 3 2 1 3 2 3
29
30. we can say
a = Ap + C where A = w w w , C = w w b + w b + b
As a result, final activation function a ^ 3 is a single linear neuron.
This only makes 1-dimension decision boundary(hyperplane), so it
can't solve complex classification problem(using more than 2-
dimension hyper plane).
So, we should use non-linear activation function in MLP.
3 3 2 1 3 2 1 3 2 3
30