Successfully reported this slideshow.
Upcoming SlideShare
×

# Master Defense Slides (translated)

885 views

Published on

This is the slides of my master defense; 17 april 2003
subject: &quot;High capacity neural network optimization problems: study &amp; solutions exploration&quot;

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Master Defense Slides (translated)

1. 1. High capacity neural network optimization problems: study & solutions exploration Francis Piéraut, eng., M.A.Sc [email_address] http://fraka6.blogspot.com/
2. 2. Plan <ul><li>Context: learn language model with NN </li></ul><ul><li>High capacity NN optimization inefficiency (error # & CPU time) </li></ul><ul><li>Is this normal? </li></ul><ul><li>Various optimization problems </li></ul><ul><li>Some solutions & results </li></ul><ul><li>Contributions </li></ul><ul><li>Futurs work </li></ul><ul><li>Conclusion </li></ul>
3. 3. Learning algorithm: Neural Network <ul><li>Problem: </li></ul><ul><li>Find P(c i |x 1 , x 2 ….) from samples P( x 1 , x 2 …. | c i ) </li></ul><ul><li>No distribution apriori </li></ul><ul><li>Complex relationships (non-linear) </li></ul><ul><li>A solution = Neural Network </li></ul>
4. 4. sortie z cible t t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and capacity P(c i |x i ) P(c i |x i ) y 2 y j z 1 Z k
5. 6. High/huge capacity Neural Network y 1 y 2 y 2
6. 7. Constraints <ul><li>First order stochatic gradient </li></ul><ul><li>Standard architecture </li></ul><ul><li>One learning rate </li></ul><ul><li>Overfitting is neglected </li></ul><ul><li>Database : </li></ul><ul><li>« Letters » 26 classes /16 inputs/20000 examples </li></ul>
7. 8. Errors :Optimization Inefficiency of High Capacity Neural Networks
8. 9. CPU time: Optimization Inefficiency of High Capacity Neural Networks
9. 10. Is this inefficiency normal? <ul><li>Hypothesis: No </li></ul><ul><li>inefficiency is created by the increase of optimisation problems inherent to the backprop stochastic </li></ul><ul><ul><li>Linear solutions vs non-linear solutions </li></ul></ul><ul><ul><li>Solutions space </li></ul></ul><ul><li>Solution = reduce ou eliminate problems related to backpropagation </li></ul>
10. 11. sortie z cible t z 1 Z k t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and equations y 2 y j
11. 12. Learning process is slowing down for non-linear relationships
12. 13. Solutions space of a N+K Neurones Neural Network Solution space of a N Neurones Neural Network Solutions space
13. 14. Similar Solutions Initial State Example 5 iterations 3 iterations
14. 15. Optimisation problems <ul><li>Moving target problem </li></ul><ul><li>Attenuation and gradient dilution </li></ul><ul><li>No specialisation mechanism (ex:boosting) </li></ul><ul><li>Opposites gradients (classification) </li></ul><ul><li>Symetry problem </li></ul>
15. 16. sortie z cible t z 1 Z k t 1 t k y 1 x i x D y N w jk w ij x 1 Neural Networks Optimization Problems <ul><li>Moving target problem </li></ul><ul><li>Attenuation and gradient dilution </li></ul><ul><li>No specialisation mechanism (ex:boosting) </li></ul><ul><li>Opposites gradients (classification) </li></ul><ul><li>Symetry problem </li></ul>y 2 y j
16. 17. Explored solutions <ul><li>Incremental Neural Networks </li></ul><ul><li>Uncoupled architecture </li></ul><ul><li>Neural Network with parameter part optimisation </li></ul><ul><li>Etc. </li></ul>
17. 18. Incremental Neural Networks : first approach
18. 19. Incremental Neural Networks : first approach (fix weights optimisation)
19. 20. Hypothesis: Incremental NN OK Incremental NN Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
20. 21. Incremental Neural Networks (1): results
21. 22. Why it doesn’t work? (critical points)
22. 25. Incremental Neural Network : second approach (add hidden layers) z 1 z 2 x 1 x 2 z 1 z 2 y 1 x 1 x 2 y 2 y 3 y 4
23. 26. Cost function curve shape
24. 27. Hypothesis: Incremental NN (add layers) OK Incremental NN (add layers) Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
25. 28. Incremental Neural Network (2): results
26. 29. Uncoupled architecture
27. 30. Hypothesis: Uncoupled Architecture OK Removed Decoupled architecture Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
28. 31. In efficiency of high capacity Neural Networks (CPU time)
29. 32. Efficiency of High capacity Neural Network: decoupled architecture
30. 33. Hypothesis: Partial Parameters optimization OK Opt. partie Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
31. 34. Neural Networks with partial parameters optimization: results All parameters optimization Max sensitivity optimization
32. 35. Why predicting parameters? (observation) Époque Valeurs
33. 36. Hypothesis * Benefit: reduce # iterations by predicting values based on history Parameter prediction Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
34. 37. Prediction : Quadratic extrapolation
35. 38. Prediction : Learning rate increase
36. 39. Contributions <ul><li>Experimental indication of optimization problem for high capacity NN </li></ul><ul><li>Same capacity (add hidden layer): </li></ul><ul><ul><li>Speed up learning </li></ul></ul><ul><ul><li>Better error rate </li></ul></ul><ul><li>Presentation of a solution that doesn’t reduce speed when we increase capacity (decoupled architecture/opposite gradients) </li></ul>
37. 40. Futur works <ul><li>Can we generalised high capacity optimization inefficiency? (more datasets) </li></ul><ul><li>In classification task, is decouped architecture a better choice in an generalization point of view? </li></ul><ul><li>Is the point critique hypothesis applicable in the context of incremental neural networks? </li></ul><ul><li>Adding hidden layers: why it doesn’t work for successive layers? </li></ul><ul><li>Parameters part optimization </li></ul><ul><ul><li>Better comprehension of results </li></ul></ul><ul><ul><li>Which selections parameters algorithm is better? </li></ul></ul><ul><li>Is there a prediction parameters technique that is efficient? </li></ul>
38. 41. Conclusion <ul><li>Partially documented experimentally high capacity Neural Network inefficiency( cpu time/# error ) </li></ul><ul><li>Various problems </li></ul><ul><li>Explored solutions: </li></ul><ul><ul><li>Incremental approach </li></ul></ul><ul><ul><li>Decoupled architecture </li></ul></ul><ul><ul><li>Part of parameters optimization </li></ul></ul><ul><ul><li>Parameters predictions </li></ul></ul><ul><ul><li>… </li></ul></ul>
39. 42. Any Questions??
40. 43. Exemple :solution linéaire
41. 44. Exemple :solution hautement non-linéaire
42. 45. Sélection des connections influençant le plus le coût
43. 46. Sélection des connections influençant le plus l’erreur T = 1 S = 0 T = 0 S = 1 T = 0 S = 0.1 T = 0 S = 0.1
44. 47. Observation: idealized behavior of the ratio time