Upcoming SlideShare
×

# Neural Networks

1,628 views
1,479 views

Published on

Published in: Technology, Education
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,628
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
116
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Neural Networks

1. 1. Neural Networks Machine Learning; Thu May 21, 2008
2. 2. Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Activator Fixed basis Trainable function weight
3. 3. Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Can we learn the basis functions?
4. 4. Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Can we learn the basis functions? Maybe reuse ideas from our previous technology? Adjustable weight vectors?
5. 5. Historical notes This motivation is not how the technology came about! Neural networks were developed in an attempt to achieve AI by modeling the brain (but artificial neural networks have very little to do with biological neural networks).
6. 6. Feed-forward neural network Two levels of “linear regression” (or classification): Output layer Hidden layer
7. 7. Feed-forward neural network
8. 8. Example tanh activation function for 3 hidden layers Linear activation function for output layer
9. 9. Example tanh activation function for 3 hidden layers Linear activation function for output layer
10. 10. Example tanh activation function for 3 hidden layers Linear activation function for output layer
11. 11. Example tanh activation function for 3 hidden layers Linear activation function for output layer
12. 12. Example tanh activation function for 3 hidden layers Linear activation function for output layer
13. 13. Network training If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.
14. 14. Network training If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training. Regression problems:
15. 15. Network training If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training. Classification problems:
16. 16. Network training If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training. Training by minimizing an error function.
17. 17. Network training If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training. Training by minimizing an error function. Typically, we cannot minimize analytically but need numerical optimization algorithms.
18. 18. Network training If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training. Training by minimizing an error function. Typically, we cannot minimize analytically but need numerical optimization algorithms. Notice: There will typically be more than one minimum of the error function (due to symmetries). Notice: At best, we can find local minima.
19. 19. Parameter optimization Numerical optimization is beyond the scope of this class – you can (should) get away with just using appropriate libraries such as GSL.
20. 20. Parameter optimization Numerical optimization is beyond the scope of this class – you can (should) get away with just using appropriate libraries such as GSL. These can typically find (local) minima of general functions, but the most efficient algorithms also need the gradient of the function to minimize.
21. 21. Back propagation Back propagation is an efficient algorithm for computing both the error function and the gradient.
22. 22. Back propagation Back propagation is an efficient algorithm for computing both the error function and the gradient. For iid data:
23. 23. Back propagation Back propagation is an efficient algorithm for computing both the error function and the gradient. For iid data: We just focus on this guy
24. 24. Back propagation After running the feed-forward algorithm, we know the values zi and yk. we need to compute the ds (called errors).
25. 25. Back propagation The choice of error function and output activator typically makes it simple to deal with the output layer
26. 26. Back propagation For hidden layers, we apply the chain rule
27. 27. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
28. 28. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
29. 29. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
30. 30. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
31. 31. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
32. 32. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
33. 33. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
34. 34. Calculating error and gradient Algorithm 1. Apply feed forward to  calculate hidden and output  variables and then the error. 2. Calculate output errors  and propagate errors back to  get the gradient. 3. Iterate ad lib.
35. 35. Overfitting The complexity of the network is determined by the hidden layer – and overfitting is still a problem. Use e.g. test datasets to alleviate this problem.
36. 36. Early stopping ...or just stop the training when overfitting starts to occur. Training data Test data
37. 37. Summary ● Neural network as general regression/classification models ● Feed-forward evaluation ● Maximum-likelihood framework ● Error back-propagation to obtain gradient