(Artificial) Neural Network

510 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
510
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

(Artificial) Neural Network

  1. 1. (Artificial) Neural Network Putri Wikie Novianti Reading Group July 11, 2012
  2. 2. Analogy with human brain
  3. 3. Input Output X1 W1 W2 Y_in Y_out X2 F Y . Activation functions: . 1. Binary step function WN . 2. Bipolar step function 3. Sigmoid function XN 4. Linear function b
  4. 4. Perceptron X1 W1 Y1 Y_in Y_out F W2 X2 Y2Initialization: Stopping criterions:- Weight (wi) and bias - Weight changes- Learning rate (α) - Error ≠ 0 b- Maximum epoch - Reach maximum epoch Update weight and bias W ij = W ij + α * tj * Xki
  5. 5. Backpropagation
  6. 6. Backpropagation
  7. 7. Backpropagation
  8. 8. Backpropagation
  9. 9. Backpropagation
  10. 10. Backpropagation
  11. 11. Backpropagation
  12. 12. Backpropagation
  13. 13. Backpropagation
  14. 14. Backpropagation
  15. 15. Backpropagation
  16. 16. Some Issues in Training NN1. Starting values Usually starting values for weights are chosen to be random values near zero. Hence the model starts out nearly linear, and becomes nonlinear as the weights increase. Use of exact zero weights leads to zero derivatives and perfect symmetry, and the algorithm never moves2. Overfitting Typically we don’t want the global minimizer of R(θ), as this is likely to be an overfit solution. Instead some regularization is needed: this is achieved directly through a penalty term, or indirectly by early stopping. R(θ) : Error as a function of the complete ser of weight θ
  17. 17. Some Issues in Training NN
  18. 18. Some Issues in Training NN
  19. 19. Some Issues in Training NN3. Scaling the input- Since scaling the input determines the effective scaling of the weight in the bottom layer, It can have the large effect of the final model- It is better to standardize all inputs (mean = 0 and std.dev = 1)- Ensure all inputs are treated equally in regulation process and allows one to choose a meaningfull range for random starting weights.- Typical weight for standardized inputs: random uniform weight over the range [-0.7, 0.7]
  20. 20. Some Issues in Training NN4. Number of hidden units and layers - Better to have too many hidden units than too few Too few hidden units: Less flexibility of the model (hard to capture nonlinearity) Too many hidden units: extra weight can be shrunk towards zero with appropriate regularization used - Typical # of hidden units : [5, 100]5. Multiple Minima - The error function R(θ) is non-convex, possessing many local minima. As result, the final solution quite depends on the choice of starting weight. - Solution: * averaging the predictions over the collection of networks as the final prediction * averaging the weight * bagging: averaging the prediction of the networks training from randomly perturbed version of the training data.
  21. 21. Example: Simulated Data
  22. 22. Example: Simulated Data
  23. 23. Example: Simulated Data
  24. 24. Example: ZIP Code Data
  25. 25. Example: ZIP Code Data
  26. 26. Example: ZIP Code Data
  27. 27. Bayesian Neural Nets A classification was held in 2003, by Neural Information Processing System (NIPS) workshop The winner: Neal and Zhang (2006) used a series of preprocessing feature selection steps, followed by Bayesian NN, Dirichelet diffusion trees, and combination of these methods.
  28. 28. Bayesian Neural NetsBayesian approach review:
  29. 29. Bayesian Neural Nets
  30. 30. Bayesian Neural Nets
  31. 31. Bayesian Neural Nets
  32. 32. Bagging and Boosting Neural Nets
  33. 33. Computational consideration
  34. 34. References[1] Zhang, X. Support Vector Machine. Lecture slides on Data Mining course. Fall 2010, KSA:KAUST[2] Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning, second edition.2009. New York: Springer

×