Machine Learning Lucy Hederman
KBS Development Stage 1:  analysis of the problem that produces a representation of the problem that can be manipulated by...
Stage 2 <ul><li>Knowledge engineering - manually </li></ul><ul><ul><li>rule development for a rule-based ES </li></ul></ul...
Risk Assessment Example <ul><li>Expert might develop rules like </li></ul><ul><ul><li>if collateral is adequate and    cre...
Classifying apples and pears To what class does this belong?
Supervised Learning <ul><li>Supervised learning </li></ul><ul><ul><li>training data classified already </li></ul></ul><ul>...
Learnability <ul><li>Induction depends on there being useful generalisations possible in the representation language used....
Similarity-based learning <ul><li>Decision tree (rule) induction </li></ul><ul><ul><li>induce a decision tree (set of rule...
Decision Tree Induction <ul><li>Aim to induce a tree which  </li></ul><ul><ul><li>correctly classifies all training data <...
ID3 <ul><li>Top-down construction </li></ul><ul><ul><li>add selected tests under nodes </li></ul></ul><ul><ul><li>each tes...
k -Nearest Neighbour Classification <ul><li>Data base of previously classified cases kept throughout. </li></ul><ul><li>Ca...
“ Nearest” - distance/similarity For query  q  and training set  X  (described by features  F ) compute  d ( x,q ) for eac...
k -NN and Noise <ul><li>1-NN easy to implement </li></ul><ul><ul><li>susceptible to  noise </li></ul></ul><ul><ul><ul><li>...
K-NN vs. Decision Trees <ul><li>Decision trees test features serially. </li></ul><ul><ul><li>If two cases don’t match on f...
Dimension reduction in k-NN <ul><li>Not all features required </li></ul><ul><ul><li>noisy features a hindrance </li></ul><...
Condensed NN 100 examples 2 categories Different CNN solutions
Feature weighting <ul><li>Feature weights </li></ul><ul><ul><li>modify the effect of large continuous distance values </li...
Feature weighting <ul><li>Introspective  learning - </li></ul><ul><li>Test training data on itself </li></ul><ul><ul><li>F...
(Artificial) Neural Networks <ul><li>Decision tree induction builds a symbolic “causal” model from training data. </li></u...
NN Prediction of Malignancy <ul><li>A. Tailor and co. paper describes a neural network which computes a probability of mal...
ANN Advantages <ul><li>Particularly suited to pattern recognition </li></ul><ul><ul><li>character, speech, image </li></ul...
ANN Problems <ul><li>Lack explanation </li></ul><ul><li>Currently implemented in software mostly. </li></ul><ul><li>Traini...
ANN Processing Element (PE) Summation - gives PE’s activation level Transfer function - modifies the activation level to p...
Typical ANN Structure <ul><li>There may be  </li></ul><ul><ul><li>additional hidden layers. </li></ul></ul><ul><ul><li>dif...
Learning/Training <ul><li>Aim to obtain desired outputs for each training example. </li></ul><ul><li>Backpropagation  is t...
Overfitting Training time Error In-sample  error Generalisation error Too much training will result in a ( k -NN or ANN) m...
ANN Development Collect data Separate into training and test sets Define a network structure Select a learning algorithm S...
Upcoming SlideShare
Loading in …5
×

Machine Learning

336 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
336
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Machine Learning

  1. 1. Machine Learning Lucy Hederman
  2. 2. KBS Development Stage 1: analysis of the problem that produces a representation of the problem that can be manipulated by the reasoning system - this representation is often a set of attribute values. Stage 2: developing the reasoning mechanism that manipulates the problem representation to produce a solution.
  3. 3. Stage 2 <ul><li>Knowledge engineering - manually </li></ul><ul><ul><li>rule development for a rule-based ES </li></ul></ul><ul><li>Learning - similarity-based </li></ul><ul><ul><li>generalise from examples (training data) </li></ul></ul><ul><li>Learning - explanation-based </li></ul><ul><ul><li>build on prior knowledge </li></ul></ul><ul><ul><li>use small number of canonical examples </li></ul></ul><ul><ul><li>incorporate explanations, analogy, ... </li></ul></ul>
  4. 4. Risk Assessment Example <ul><li>Expert might develop rules like </li></ul><ul><ul><li>if collateral is adequate and credit history is good then risk is low. </li></ul></ul><ul><li>Alternatively build a system which learns from existing data on loan application decisions (see attached). </li></ul><ul><ul><li>Similarity-based learning </li></ul></ul>
  5. 5. Classifying apples and pears To what class does this belong?
  6. 6. Supervised Learning <ul><li>Supervised learning </li></ul><ul><ul><li>training data classified already </li></ul></ul><ul><li>Unsupervised learning </li></ul><ul><ul><li>acquire useful(?) knowledge without correctly classified training data </li></ul></ul><ul><ul><li>category formation </li></ul></ul><ul><ul><li>scientific discovery </li></ul></ul><ul><li>We look at supervised learning only. </li></ul>
  7. 7. Learnability <ul><li>Induction depends on there being useful generalisations possible in the representation language used. </li></ul><ul><li>Learnability of concepts in a representation language is the ability to express the concept concisely. </li></ul><ul><li>Random classifications are not learnable. </li></ul>
  8. 8. Similarity-based learning <ul><li>Decision tree (rule) induction </li></ul><ul><ul><li>induce a decision tree (set of rules) from the training data. </li></ul></ul><ul><li>k -nearest neighbour classification </li></ul><ul><ul><li>classify a new problem based on the k most similar cases in the training data. </li></ul></ul><ul><li>Artificial Neural Networks </li></ul><ul><ul><li>adjust weights in an NN to reduce errors on training data. </li></ul></ul>
  9. 9. Decision Tree Induction <ul><li>Aim to induce a tree which </li></ul><ul><ul><li>correctly classifies all training data </li></ul></ul><ul><ul><li>will correctly classify unseen cases </li></ul></ul><ul><li>ID3 algorithm assumes that the simplest tree that covers all the training examples is the best at unseen problems. </li></ul><ul><ul><li>Leaving out extraneous tests should be good for generalising. </li></ul></ul>
  10. 10. ID3 <ul><li>Top-down construction </li></ul><ul><ul><li>add selected tests under nodes </li></ul></ul><ul><ul><li>each test further partitions the samples </li></ul></ul><ul><ul><li>continue till each partition is homogeneous </li></ul></ul><ul><li>Information-theoretic test selection </li></ul><ul><ul><li>maximise information gain </li></ul></ul><ul><li>ID3 works surprisingly well. Variations and alternatives exist. </li></ul>
  11. 11. k -Nearest Neighbour Classification <ul><li>Data base of previously classified cases kept throughout. </li></ul><ul><li>Category of target case decided by category of its k nearest neighbours. </li></ul><ul><li>No inducing or training of a model. </li></ul><ul><li>“ Lazy” learning </li></ul><ul><ul><li>work deferred to runtime </li></ul></ul><ul><ul><li>compare with neural networks - eager learners </li></ul></ul>
  12. 12. “ Nearest” - distance/similarity For query q and training set X (described by features F ) compute d ( x,q ) for each x  X, where and where
  13. 13. k -NN and Noise <ul><li>1-NN easy to implement </li></ul><ul><ul><li>susceptible to noise </li></ul></ul><ul><ul><ul><li>a misclassification every time a noisy pattern retrieved </li></ul></ul></ul><ul><li>k -NN with k  3 will overcome this </li></ul><ul><li>Either </li></ul><ul><ul><li>straight voting between the k examples or </li></ul></ul><ul><ul><li>weighted votes depending on “nearness” of each example. </li></ul></ul>
  14. 14. K-NN vs. Decision Trees <ul><li>Decision trees test features serially. </li></ul><ul><ul><li>If two cases don’t match on first feature tried they don’t match at all. </li></ul></ul><ul><li>K-NN considers all features in parallel. </li></ul><ul><li>For some tasks serial testing is OK, for others it’s not. </li></ul>
  15. 15. Dimension reduction in k-NN <ul><li>Not all features required </li></ul><ul><ul><li>noisy features a hindrance </li></ul></ul><ul><li>Some examples redundant </li></ul><ul><ul><li>retrieval time depends on no. of examples </li></ul></ul>p features q best features n covering examples m examples Feature Selection Condensed NN
  16. 16. Condensed NN 100 examples 2 categories Different CNN solutions
  17. 17. Feature weighting <ul><li>Feature weights </li></ul><ul><ul><li>modify the effect of large continuous distance values </li></ul></ul><ul><ul><li>allow some features to be treated as more important than others </li></ul></ul><ul><ul><ul><li>pull cases with important features in common closer together. </li></ul></ul></ul>
  18. 18. Feature weighting <ul><li>Introspective learning - </li></ul><ul><li>Test training data on itself </li></ul><ul><ul><li>For a correct retrieval </li></ul></ul><ul><ul><ul><li>increase weight of matching features (pull) </li></ul></ul></ul><ul><ul><ul><li>decrease weight of un-matching features (pull) </li></ul></ul></ul><ul><ul><li>For an incorrect retrieval </li></ul></ul><ul><ul><ul><li>decrease weight of matching features (push) </li></ul></ul></ul><ul><ul><ul><li>increase weight of un-matching features (push) </li></ul></ul></ul>Pull Push
  19. 19. (Artificial) Neural Networks <ul><li>Decision tree induction builds a symbolic “causal” model from training data. </li></ul><ul><li>k -NN builds no model. </li></ul><ul><li>A neural network is a sub-symbolic, non-causal, distributed, “black box”, model built from training data. </li></ul><ul><li>ANN output is continuous whereas a k -NN classifies into discrete classes. </li></ul>
  20. 20. NN Prediction of Malignancy <ul><li>A. Tailor and co. paper describes a neural network which computes a probability of malignancy from age, morphological features, and sonographic data. </li></ul><ul><li>Describes design and testing of the NN. </li></ul><ul><li>Note intro to NNs in the Appendix </li></ul>
  21. 21. ANN Advantages <ul><li>Particularly suited to pattern recognition </li></ul><ul><ul><li>character, speech, image </li></ul></ul><ul><li>Suited to domains where there is no domain theory or model. </li></ul><ul><li>Robust - Handle noisy and incomplete data well. </li></ul><ul><li>Potentially fast. Parallel processing. </li></ul><ul><li>Flexible and easy to maintain. </li></ul>
  22. 22. ANN Problems <ul><li>Lack explanation </li></ul><ul><li>Currently implemented in software mostly. </li></ul><ul><li>Training times can be tedious. </li></ul><ul><li>Need lots of training and test data. </li></ul><ul><ul><li>True of similarity-based learning in general. </li></ul></ul>
  23. 23. ANN Processing Element (PE) Summation - gives PE’s activation level Transfer function - modifies the activation level to produce a reasonable output value (e.g. 0-1) .
  24. 24. Typical ANN Structure <ul><li>There may be </li></ul><ul><ul><li>additional hidden layers. </li></ul></ul><ul><ul><li>different topologies </li></ul></ul><ul><ul><li>different connectivity </li></ul></ul><ul><li>Choosing ANN structure </li></ul><ul><ul><li>is based on problem and </li></ul></ul><ul><ul><li>requires some expertise. </li></ul></ul>PE PE Input layer Hidden layer Output layer PE PE PE PE PE PE
  25. 25. Learning/Training <ul><li>Aim to obtain desired outputs for each training example. </li></ul><ul><li>Backpropagation is the most popular learning algorithm. </li></ul><ul><ul><li>Initialise all weights associated with inputs to each PE. </li></ul></ul><ul><ul><li>Present sample inputs to ANN. </li></ul></ul><ul><ul><li>Compare ANN outputs with desired output. </li></ul></ul><ul><ul><li>Alter weights to reduce the mean square error, and repeat. </li></ul></ul><ul><li>until the error is within some tolerance. </li></ul>
  26. 26. Overfitting Training time Error In-sample error Generalisation error Too much training will result in a ( k -NN or ANN) model that makes minimal errors on the training data (memorises), but no longer generalises well. Beware.
  27. 27. ANN Development Collect data Separate into training and test sets Define a network structure Select a learning algorithm Set parameters, values, weights Transform data to network inputs Start training, revise weights Stop and test Use the network for new cases. Get more better data Reseparate Redefine structure Select another algorithm Reset Reset

×