Neural Networks

412 views
331 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
412
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Neural Networks

  1. 1. Neural Networks An Overview <ul><li>Brad Morantz PhD </li></ul><ul><li>Viral Immunology Center </li></ul><ul><li>Georgia State University </li></ul><ul><li>[email_address] </li></ul><ul><li>www.geocities.com/bradscientist </li></ul>
  2. 2. Neural Networks I think, therefore I am
  3. 3. Contents <ul><li>Introduction </li></ul><ul><li>Expert Systems </li></ul><ul><li>Neural Networks </li></ul><ul><li>Prediction </li></ul><ul><li>Classification </li></ul><ul><li>Pattern Recognition </li></ul><ul><li>Overview </li></ul><ul><li>Comparisons </li></ul><ul><li>How a NN works </li></ul><ul><li>Examples </li></ul><ul><li>Applications </li></ul><ul><li>Future </li></ul><ul><li>Resources </li></ul><ul><li>Appendices </li></ul><ul><ul><li>Degrees of freedom </li></ul></ul><ul><ul><li>Machine Learning </li></ul></ul><ul><ul><li>Quantitative Methods </li></ul></ul>
  4. 4. The flight to Dallas is cancelled That’s Wonderful!
  5. 5. Why is it wonderful? <ul><li>Either there is something wrong with : </li></ul><ul><ul><li>The airplane </li></ul></ul><ul><ul><li>The Pilot </li></ul></ul><ul><ul><li>Dallas </li></ul></ul>If there is something wrong with any of these, I don’t want to be on the airplane This example courtesy of Zig Ziglear
  6. 6. Has Anyone been turned down on a credit card transaction? <ul><li>For reasons other than late payments? </li></ul>
  7. 7. I was ID’d buying a VCR <ul><li>This is what I normally do: </li></ul><ul><ul><li>I normally buy gas once a week </li></ul></ul><ul><ul><li>I usually spend about $10 </li></ul></ul><ul><ul><li>I buy a book once a month </li></ul></ul><ul><li>This is MY pattern </li></ul>
  8. 8. I need to see your ID That’s Wonderful! Scene at the Store
  9. 9. Why did the Salesperson ask for ID? <ul><li>The credit card machine told her to </li></ul><ul><li>Why did the machine ask for it? </li></ul><ul><ul><li>They have a pattern of my normal activity </li></ul></ul><ul><ul><li>Buying a VCR was not part of that pattern </li></ul></ul><ul><ul><li>Expensive purchases were not normal for me </li></ul></ul><ul><ul><li>It wanted to be sure that it was not a stolen card </li></ul></ul><ul><ul><li>Criminals with stolen cards often buy expensive electronic items like VCR’s </li></ul></ul><ul><ul><li>It is a pattern for a stolen credit card </li></ul></ul>
  10. 10. How did the credit card machine do this? <ul><li>It used Artificial Intelligence </li></ul><ul><li>Computer program </li></ul><ul><ul><li>Expert System </li></ul></ul><ul><ul><li>Neural Network </li></ul></ul><ul><ul><li>Hybrid </li></ul></ul>
  11. 11. Prediction <ul><li>Last week, 4 apples cost $2.00 </li></ul><ul><li>Yesterday, 6 apples cost $3.00 </li></ul><ul><li>Today, 10 apples cost $5.00 </li></ul><ul><li>How much will 5 apples cost tomorrow? </li></ul>$2.50 This is linear, but an ANN can handle nonlinear equally well
  12. 12. Classification Problem <ul><li># wheels length Noise # seats what is it </li></ul><ul><li>4 short quiet 4 car </li></ul><ul><li>2 short loud 1 motorcycle </li></ul><ul><li>6 long loud lots bus </li></ul><ul><li>18 v long loud 2 semi </li></ul>With a few inputs, we classified the vehicle Notice that is in classes, not a specific vehicle, i.e. 300ZX
  13. 13. Pattern Recognition <ul><li>Your Friend Mary George Washington </li></ul>These are very specific answers
  14. 14. What do these examples have in common? <ul><li>Learned </li></ul><ul><li>From experience </li></ul><ul><li>From historical data </li></ul><ul><li>By example </li></ul><ul><li>By organization </li></ul>
  15. 15. What is an Expert System <ul><li>Knowledge based system </li></ul><ul><li>Having an expert that you can ask questions </li></ul><ul><li>Computer program </li></ul><ul><ul><li>Has set of rules or heuristics </li></ul></ul><ul><ul><li>From a human expert </li></ul></ul><ul><ul><li>Behaves like human expert </li></ul></ul>
  16. 16. Quick overview of expert system <ul><li>Requires Expert </li></ul><ul><li>Requires Knowledge Engineer </li></ul><ul><li>Stores knowledge in set of rules </li></ul><ul><li>Must have rule for all possibilities </li></ul><ul><li>Contains bias, opinions, and emotions of expert </li></ul><ul><li>May not be current </li></ul><ul><li>Can ask it reason for its answer </li></ul><ul><li>Mycin, Teresius, etc (Gordon & Shortliffe) </li></ul>
  17. 17. What is a Neural Network? <ul><li>A human Brain </li></ul><ul><li>A porpoise brain </li></ul><ul><li>The brain in a living creature </li></ul><ul><li>A computer program </li></ul><ul><ul><li>Emulates biological brain </li></ul></ul><ul><ul><li>Limited connections </li></ul></ul><ul><li>Specialized computer chip </li></ul>
  18. 18. Three types of Functions <ul><li>Prediction and Time Series Forecasting </li></ul><ul><ul><li>Like regression, but not constrained to linear </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Sort into a class, like cluster analysis </li></ul></ul><ul><li>Pattern Recognition </li></ul><ul><ul><li>Fined tuned classification </li></ul></ul><ul><li>Not constrained to linear or Gauss Normal distribution </li></ul>
  19. 19. Advantages of Neural Network <ul><li>No Expert needed </li></ul><ul><li>No Knowledge Engineer needed </li></ul><ul><li>Does not have bias of expert </li></ul><ul><li>Can interpolate for all cases </li></ul><ul><li>Learns from facts </li></ul><ul><li>Can resolve conflicts </li></ul><ul><li>Variables can be correlated (multicollinearity) </li></ul>
  20. 20. More Advantages <ul><li>Learns relationships </li></ul><ul><li>Can make good model with noisy or incomplete data </li></ul><ul><li>Can handle nonlinear or discontinuous data </li></ul><ul><li>Can Handle data of unknown or undefined distribution </li></ul><ul><li>Data Driven </li></ul>
  21. 21. Disadvantages of Neural Net <ul><li>Black Box </li></ul><ul><ul><li>don’t know why or how </li></ul></ul><ul><ul><li>not sure of what it is looking at </li></ul></ul><ul><li>Operator dependent </li></ul><ul><li>Don’t have knowledge in hand </li></ul><ul><li>* Many of these disadvantages are being overcome </li></ul>
  22. 22. Black Box <ul><li>What happens inside the box is unknown </li></ul><ul><li>We can’t see into the box </li></ul><ul><li>We don’t know what it knows </li></ul>input output
  23. 23. Key to Neural Networks <ul><li>Learns input to output relationship </li></ul>
  24. 24. Biological Neural Network <ul><li>Human Brain has 4 x 10 10 to 10 11 Neurons </li></ul><ul><li>Each can have 10,000 connections* </li></ul><ul><li>Human baby makes 1 million connections per second until age 2 </li></ul><ul><li>Speed of synapse is 1 kHz, much slower than computer (3.0gHz) </li></ul><ul><li>Massively parallel structure </li></ul><ul><li>* Some estimates are much greater </li></ul>
  25. 25. How does a neuron work? <ul><li>It sums the weighted inputs </li></ul><ul><ul><li>If it is enough, then neuron fires </li></ul></ul><ul><ul><li>There can be as many as 10,000 or more inputs </li></ul></ul>
  26. 26. Computer Neural Network <ul><li>Von Neuman architecture </li></ul><ul><li>Serial machine with inherently parallel process </li></ul><ul><li>Series of mathematical equations </li></ul><ul><li>Simulates relatively small brain </li></ul><ul><li>Limited connectivity </li></ul><ul><li>Closely approximates complex nonlinear functions </li></ul>
  27. 27. Computer Emulation of NN <ul><li>Called Artificial Neural Network (ANN) </li></ul><ul><li>Two uses </li></ul><ul><ul><li>Applications as shown </li></ul></ul><ul><ul><ul><li>Forecasting/Prediction </li></ul></ul></ul><ul><ul><ul><li>Classification </li></ul></ul></ul><ul><ul><ul><li>Pattern recognition </li></ul></ul></ul><ul><ul><li>Use model to help understand biological neural network </li></ul></ul>
  28. 28. Neuron Activation <ul><li>Weights can be positive or negative </li></ul><ul><li>Negative weight inhibits neuron firing </li></ul><ul><li>Sum = W 1 N 1 + W 2 N 2 + …. + W n N n </li></ul><ul><li>If sum is negative, neuron does not fire </li></ul><ul><li>If sum is positive neuron fires </li></ul><ul><li>Fire means an output from neuron </li></ul><ul><li>Non-linear function </li></ul><ul><li>Some models include a threshold </li></ul>
  29. 29. Neuron Activation <ul><li>Linear </li></ul><ul><li>Sigmoidal </li></ul><ul><ul><li>1.0/(1.0+e -s ) where s = Σ inputs </li></ul></ul><ul><ul><li>0 or +1 result </li></ul></ul><ul><li>Hyperbolic Tangent </li></ul><ul><ul><li>(e s – e -s ) / (e s + e -s ) where s = Σ inputs </li></ul></ul><ul><ul><li>-1 or +1 result </li></ul></ul><ul><li>Also called squashing or clamping function </li></ul>
  30. 30. What does the network look like? <ul><li>This is a computer model, not biological </li></ul><ul><li>Left has 11 neurons, sea slug has 100 </li></ul>Feed Forward Recurrent or Feedback
  31. 31. Small Neural Network Input Nodes Hidden Nodes Output Nodes I 1 I 2 I 3 F 11 F 21 F 31 W 111 W 112 W 113 W 211 W 212 W 213 W 311 W 312 W 313 F 12 F 22 F 32 W 121 W 122 W 221 W 222 W 321 W 322 F 13 F 23 Out 1 Out 2
  32. 32. Mathematical Equations <ul><li>Input to Hidden 12 =H 1 </li></ul><ul><li>H 1 = [(I 1 *F 11 )*W 111 ] + [(I 2 *F 21 )*W 211 ] + [(I 3 *F 31 )*W 311 ] </li></ul><ul><li>H 2 = . . . . . </li></ul><ul><li>H 3 = . . . . . </li></ul><ul><li>Out 1 =[(H 1 *F 12 )*W 121 ] + [(H 2 *F 22 )*W 221 ] + [(H 3 *F 32 )*W 321 ] </li></ul>
  33. 33. Comparison to Regression <ul><li>OLS with 3 independent and 1 dependent would have a maximum of 3 variables, 3 coefficients, and 1 intercept </li></ul><ul><li>With 2 dependent variables, it would require Canonical Correlation (general linear model) and the same number of coefficients </li></ul><ul><li>ANN has 15 coefficients (weights) and activation functions can be nonlinear </li></ul><ul><li>Multicollinearity and VIF are not a problem in ANN </li></ul>
  34. 34. Macro View of Training <ul><li>Setting all of the weights </li></ul><ul><li>To create optimal performance </li></ul><ul><li>Optimal adherence to training data </li></ul><ul><li>Really an optimization problem </li></ul><ul><ul><li>Optimal methods depends on many variables </li></ul></ul><ul><ul><li>See optimization lecture </li></ul></ul><ul><li>Need objective function </li></ul><ul><li>Beware of local minima! </li></ul>
  35. 35. Optimization Methods <ul><li>Back Propagation (most popular) </li></ul><ul><li>Gradient Descent </li></ul><ul><li>Generalized reduced gradient (GRG) </li></ul><ul><li>Simulated Annealing </li></ul><ul><li>Genetic Algorithm </li></ul><ul><li>Two or more output nodes </li></ul><ul><ul><li>Multi objective optimization </li></ul></ul><ul><li>Many more methods </li></ul>
  36. 36. Training <ul><li>Supervised </li></ul><ul><ul><li>Data given, along with answer </li></ul></ul><ul><ul><li>Compare to prediction and go try again </li></ul></ul><ul><ul><li>Historical data and/or rules </li></ul></ul><ul><li>Unsupervised </li></ul><ul><ul><li>No teacher or feedback </li></ul></ul><ul><ul><li>Self organizing </li></ul></ul>
  37. 37. Training Data Set <ul><li>Need more observations than weights </li></ul><ul><ul><li>Positive # degrees freedom </li></ul></ul><ul><li>More observations is usually better </li></ul><ul><ul><li>Lower variance </li></ul></ul><ul><ul><li>More knowledge </li></ul></ul><ul><li>Watch aging of data </li></ul>
  38. 38. Data Window <ul><li>Rolling window </li></ul><ul><ul><li>Keep all data </li></ul></ul><ul><ul><li>Add new data as available </li></ul></ul><ul><li>Moving window </li></ul><ul><ul><li>Train with fixed amount of data </li></ul></ul><ul><ul><li>As new observation is added, oldest is deleted </li></ul></ul><ul><ul><li>Necessary when underlying factors change </li></ul></ul>
  39. 39. Data Window Continued <ul><li>Weighted Window </li></ul><ul><ul><li>Morantz, Whalen, & Zhang </li></ul></ul><ul><ul><li>Superset of rolling & moving window </li></ul></ul><ul><ul><li>Oldest data is reduced in importance </li></ul></ul><ul><ul><li>Has reduced residual by as much as 50% </li></ul></ul><ul><ul><li>Multi factor ANOVA shows results significant in majority of applications with real world data </li></ul></ul>
  40. 40. Dynamic Learning <ul><li>Continuous learning </li></ul><ul><ul><li>From mistakes and successes </li></ul></ul><ul><ul><li>From new information </li></ul></ul><ul><li>Shooting baskets example </li></ul><ul><ul><li>Too low. Learned: throw harder </li></ul></ul><ul><ul><li>Too high. Learned: throw softer, but not as soft as before </li></ul></ul><ul><ul><li>Basket! Learned: correct amount of “push” </li></ul></ul><ul><li>Loaning $10 example </li></ul>
  41. 41. inputs <ul><li>One per input node </li></ul><ul><li>Ratio </li></ul><ul><li>Logical </li></ul><ul><li>Dummy </li></ul><ul><li>Categorical </li></ul><ul><li>Ordinal </li></ul>
  42. 42. Outputs <ul><li>One for single dependent variable </li></ul><ul><li>Multiple </li></ul><ul><ul><li>Prediction </li></ul></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><li>Pattern recognition </li></ul></ul>
  43. 43. Hidden Layer(s) <ul><li>Increase complexity </li></ul><ul><li>Can increase accuracy </li></ul><ul><li>Can reduce degrees of freedom </li></ul><ul><ul><li>Need larger data set </li></ul></ul><ul><li>Presently architecture up to programmer </li></ul><ul><li>Source for error </li></ul><ul><li>In future will be more automatic </li></ul>
  44. 44. Methodology <ul><li>Split training data </li></ul><ul><ul><li>80/20 is most common </li></ul></ul><ul><ul><li>Other methods depending on # observations </li></ul></ul><ul><li>Train on the 80% </li></ul><ul><li>Validate on the 20% </li></ul><ul><li>Never test on what you used for training </li></ul>
  45. 45. Hybrids <ul><li>Use advantages of both Expert and ANN </li></ul><ul><li>Uses more knowledge than just one type </li></ul><ul><li>Can seed system with expert knowledge and then update with data </li></ul><ul><li>Sometimes hard to get all parts to work together </li></ul><ul><li>Harder to validate model </li></ul>
  46. 46. Example <ul><li>You go some place that you have never been before, and get “bad vibes” </li></ul><ul><ul><li>Atmosphere, temperature, lighting, smell, coloring, numerous things </li></ul></ul><ul><li>For some reason, brain associates these together, possibly some past experience </li></ul><ul><li>Gives you “bad feeling” </li></ul>
  47. 47. Applications <ul><li>Credit card fraud </li></ul><ul><li>Military: submarine, tank, sniper detect, or target recognition </li></ul><ul><li>Security </li></ul><ul><li>Classify stars & planets </li></ul><ul><li>Data mining </li></ul><ul><li>Natural language recognition </li></ul>
  48. 48. Future <ul><li>Rule extraction </li></ul><ul><li>Hybrids </li></ul><ul><li>Dynamic learning </li></ul><ul><li>Parallel processing </li></ul><ul><li>Dedicated chips </li></ul><ul><li>Bigger & more automatic </li></ul><ul><li>Machine Learning </li></ul>
  49. 49. Contacts <ul><li>IEEE Transactions on Neural Networks </li></ul><ul><li>IEEE Intelligent Systems Journal </li></ul><ul><li>IEEE Neural Network Society </li></ul><ul><li>AAAI American Association for Artificial Intelligence </li></ul><ul><li>www.ieee.org </li></ul><ul><li>www.zsolutions.com </li></ul><ul><li>Internet </li></ul><ul><li>www.geocities.com/bradscientist </li></ul>
  50. 50. Appendix – degrees of freedom <ul><li>Number of values free to vary </li></ul><ul><ul><li>Complex math definition </li></ul></ul><ul><li>Must watch, not less than zero </li></ul><ul><li>i.e. 3X + 4Y = Z </li></ul><ul><ul><li>3 unknowns, if know Z only one other can vary </li></ul></ul><ul><ul><li>If have 3 equations, values can not vary </li></ul></ul><ul><li>Large ANN needs big sample data </li></ul>
  51. 51. Appendix - Machine Learning <ul><li>Machine learns from experience </li></ul><ul><li>From experience and data it improves its knowledge (rule base or weights) </li></ul><ul><li>Shooting baskets example </li></ul>
  52. 52. Appendix- Quantitative Methods <ul><li>OLS Regression </li></ul><ul><li>Canonical Correlation- General Linear Model </li></ul><ul><li>Classification </li></ul><ul><li>Cluster Analysis </li></ul>

×