Successfully reported this slideshow.

Master Defense Slides (translated)

846 views

Published on

This is the slides of my master defense; 17 april 2003
subject: "High capacity neural network optimization problems: study & solutions exploration"

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Master Defense Slides (translated)

  1. 1. High capacity neural network optimization problems: study & solutions exploration Francis Piéraut, eng., M.A.Sc [email_address] http://fraka6.blogspot.com/
  2. 2. Plan <ul><li>Context: learn language model with NN </li></ul><ul><li>High capacity NN optimization inefficiency (error # & CPU time) </li></ul><ul><li>Is this normal? </li></ul><ul><li>Various optimization problems </li></ul><ul><li>Some solutions & results </li></ul><ul><li>Contributions </li></ul><ul><li>Futurs work </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Learning algorithm: Neural Network <ul><li>Problem: </li></ul><ul><li>Find P(c i |x 1 , x 2 ….) from samples P( x 1 , x 2 …. | c i ) </li></ul><ul><li>No distribution apriori </li></ul><ul><li>Complex relationships (non-linear) </li></ul><ul><li>A solution = Neural Network </li></ul>
  4. 4. sortie z cible t t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and capacity P(c i |x i ) P(c i |x i ) y 2 y j z 1 Z k
  5. 6. High/huge capacity Neural Network y 1 y 2 y 2
  6. 7. Constraints <ul><li>First order stochatic gradient </li></ul><ul><li>Standard architecture </li></ul><ul><li>One learning rate </li></ul><ul><li>Overfitting is neglected </li></ul><ul><li>Database : </li></ul><ul><li>« Letters » 26 classes /16 inputs/20000 examples </li></ul>
  7. 8. Errors :Optimization Inefficiency of High Capacity Neural Networks
  8. 9. CPU time: Optimization Inefficiency of High Capacity Neural Networks
  9. 10. Is this inefficiency normal? <ul><li>Hypothesis: No </li></ul><ul><li>inefficiency is created by the increase of optimisation problems inherent to the backprop stochastic </li></ul><ul><ul><li>Linear solutions vs non-linear solutions </li></ul></ul><ul><ul><li>Solutions space </li></ul></ul><ul><li>Solution = reduce ou eliminate problems related to backpropagation </li></ul>
  10. 11. sortie z cible t z 1 Z k t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and equations y 2 y j
  11. 12. Learning process is slowing down for non-linear relationships
  12. 13. Solutions space of a N+K Neurones Neural Network Solution space of a N Neurones Neural Network Solutions space
  13. 14. Similar Solutions Initial State Example 5 iterations 3 iterations
  14. 15. Optimisation problems <ul><li>Moving target problem </li></ul><ul><li>Attenuation and gradient dilution </li></ul><ul><li>No specialisation mechanism (ex:boosting) </li></ul><ul><li>Opposites gradients (classification) </li></ul><ul><li>Symetry problem </li></ul>
  15. 16. sortie z cible t z 1 Z k t 1 t k y 1 x i x D y N w jk w ij x 1 Neural Networks Optimization Problems <ul><li>Moving target problem </li></ul><ul><li>Attenuation and gradient dilution </li></ul><ul><li>No specialisation mechanism (ex:boosting) </li></ul><ul><li>Opposites gradients (classification) </li></ul><ul><li>Symetry problem </li></ul>y 2 y j
  16. 17. Explored solutions <ul><li>Incremental Neural Networks </li></ul><ul><li>Uncoupled architecture </li></ul><ul><li>Neural Network with parameter part optimisation </li></ul><ul><li>Etc. </li></ul>
  17. 18. Incremental Neural Networks : first approach
  18. 19. Incremental Neural Networks : first approach (fix weights optimisation)
  19. 20. Hypothesis: Incremental NN OK Incremental NN Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
  20. 21. Incremental Neural Networks (1): results
  21. 22. Why it doesn’t work? (critical points)
  22. 25. Incremental Neural Network : second approach (add hidden layers) z 1 z 2 x 1 x 2 z 1 z 2 y 1 x 1 x 2 y 2 y 3 y 4
  23. 26. Cost function curve shape
  24. 27. Hypothesis: Incremental NN (add layers) OK Incremental NN (add layers) Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
  25. 28. Incremental Neural Network (2): results
  26. 29. Uncoupled architecture
  27. 30. Hypothesis: Uncoupled Architecture OK Removed Decoupled architecture Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
  28. 31. In efficiency of high capacity Neural Networks (CPU time)
  29. 32. Efficiency of High capacity Neural Network: decoupled architecture
  30. 33. Hypothesis: Partial Parameters optimization OK Opt. partie Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
  31. 34. Neural Networks with partial parameters optimization: results All parameters optimization Max sensitivity optimization
  32. 35. Why predicting parameters? (observation) Époque Valeurs
  33. 36. Hypothesis * Benefit: reduce # iterations by predicting values based on history Parameter prediction Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution
  34. 37. Prediction : Quadratic extrapolation
  35. 38. Prediction : Learning rate increase
  36. 39. Contributions <ul><li>Experimental indication of optimization problem for high capacity NN </li></ul><ul><li>Same capacity (add hidden layer): </li></ul><ul><ul><li>Speed up learning </li></ul></ul><ul><ul><li>Better error rate </li></ul></ul><ul><li>Presentation of a solution that doesn’t reduce speed when we increase capacity (decoupled architecture/opposite gradients) </li></ul>
  37. 40. Futur works <ul><li>Can we generalised high capacity optimization inefficiency? (more datasets) </li></ul><ul><li>In classification task, is decouped architecture a better choice in an generalization point of view? </li></ul><ul><li>Is the point critique hypothesis applicable in the context of incremental neural networks? </li></ul><ul><li>Adding hidden layers: why it doesn’t work for successive layers? </li></ul><ul><li>Parameters part optimization </li></ul><ul><ul><li>Better comprehension of results </li></ul></ul><ul><ul><li>Which selections parameters algorithm is better? </li></ul></ul><ul><li>Is there a prediction parameters technique that is efficient? </li></ul>
  38. 41. Conclusion <ul><li>Partially documented experimentally high capacity Neural Network inefficiency( cpu time/# error ) </li></ul><ul><li>Various problems </li></ul><ul><li>Explored solutions: </li></ul><ul><ul><li>Incremental approach </li></ul></ul><ul><ul><li>Decoupled architecture </li></ul></ul><ul><ul><li>Part of parameters optimization </li></ul></ul><ul><ul><li>Parameters predictions </li></ul></ul><ul><ul><li>… </li></ul></ul>
  39. 42. Any Questions??
  40. 43. Exemple :solution linéaire
  41. 44. Exemple :solution hautement non-linéaire
  42. 45. Sélection des connections influençant le plus le coût
  43. 46. Sélection des connections influençant le plus l’erreur T = 1 S = 0 T = 0 S = 1 T = 0 S = 0.1 T = 0 S = 0.1
  44. 47. Observation: idealized behavior of the ratio time

×