Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems


Published on

Ignas Kukenys, Will N. Browne, Mengjie Zhang. "Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems". IWLCS, 2011

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

  1. 1. EvolutionaryComputationResearchGroup Feature Pattern Classifier System Handwritten Digit Classification with LCS Ignas Kukenys Victoria University of Wellington (now University of Otago) Will N. Browne Victoria University of Wellington Mengjie Zhang Victoria University of Wellington
  2. 2. Contextl  Machine learning for Robotics: l  Needs to be reinforcement-based and online l  Preferably also adaptive and transparentl  Learning from visual input is hard: l  High-dimensionality vs. sparseness of datal  Why Learning Classifier Systems l  Robust reinforcement learning l  Limited applications for visual input 2
  3. 3. Goalsl  Adapt LCS to learn from image data l  Use image features that enable generalisation l  Tweak the evolutionary process l  Use a well known vision problem for evaluationl  Build a classifier system for handwritten digit classification 3
  4. 4. Learning Classifier Systemsl  LCS model an agent interacting with an unknown environment: l  Agent observes a state of the environment l  Agent performs an action l  Environment provides a rewardl  The above contract constrains learning: l  Online: one problem instance at a time l  Ground truth not available (non-supervised) 4
  5. 5. Learning Classifier Systems 5
  6. 6. Learning Classifier Systems 6
  7. 7. Basics of LCSl  LCS evolve a population of rules: if condition(s) then actionl  Each rule also has associated properties: l  Predicted reward for advocated action l  Accuracy based on prediction error l  Fitness based on relative accuracy 7
  8. 8. Simple rule conditionsl  Traditionally LCS use dont care (#) encoding: l  e.g. condition #1# matches states 010, 111, 110 and 111l  Enables rules to generalise over multiple statesl  Varying levels of generalisation: l  ### matches all possible states l  010 matches a single specific state
  9. 9. Naïve image classificationl  Consider binary 3x3 pixel patterns:l  How to separate them into two classes based on the colour of centre point? 9
  10. 10. Naïve image classificationl  Environment states: 9 bit messages l  e.g. 011100001 and 100010101l  Two actions represent two classes: 0, 1l  Two rules are sufficient to solve the problem: [### #0# ###] → 0 [### #1# ###] → 1 10
  11. 11. Naïve image classificationl  Example 2: how to classify 3x3 patterns that have “a horizontal line of 3 white pixels”? [111 ### ###] → 1 [### 111 ###] → 1 [### ### 111] → 1l  Example 3: how to deal with 3x3 patterns “at least one 0 on every row”? l  27 unique rules to fully describe the problem 11
  12. 12. Naïve image classificationl  Number of rules explodes for complex patternsl  Consider 256 pixel values for grey-scale, …l  Very limited generalisation in such conditionsl  Photographic and other “real world” images: l  Significantly different at “pixel level” l  Need more flexible conditions 12
  13. 13. Haar-like features 13
  14. 14. Haar-like featuresl  Compute differences between pixel sums in rectangular regions of the imagel  Very efficient with the use of “integral image”l  Widely used in computer vision l  e.g. state of the art Viola & Jones face detectorl  Can be flexibly placed at different scales and positions in the imagel  Enable varying levels of generalisation 14
  15. 15. Haar-like feature rulesl  To obtain LCS-like rules, feature outputs need to be thresholded:if (feature(type, position, scale) > threshold) then actionl  Flexible direction of comparison: < and >l  Range: t_low < feature < t_high 15
  16. 16. “Messy” encodingl  Multiple features form stronger rules:if (feature_1 && feature_2 && feature_3 ...) then actionl  Seems to be a limit to a useful number of features: 16
  17. 17. MNIST digits datasetl  Well known handwritten digits datasetl  60 000 training examples, 10 classesl  Examples from 250 subjectsl  28x28 pixel grey-scale (0..255) imagesl  10 000 evaluation examples (test set, different subjects) 17
  18. 18. MNIST results 18
  19. 19. MNIST resultsl  Performance: l  Training set: 92% after 4M observations l  Evaluation set: 91%l  Supervised and off-line methods reach 99%l  Encouraging initial result for reinforcement learning 19
  20. 20. Adaptive learning 20
  21. 21. Why not 100% performance? 21
  22. 22. Improving the FPCSl  Tournament selection l  Performs better than proportional RWl  Crossover only at feature level l  Rules swap features, not individual attributesl  Features start at “best” position, then mutate l  Instead of random position place feature where the output is highestl  With all other fixes, performance still at 94% 22
  23. 23. Why not 100% performance?•  Online reinforcement learning •  Cannot adapt rules based on known ground truth•  Forms of complete map of all states to all actions to their reward, e.g. learns “not a 3” •  Rather than just correct state: action mapping•  Only uses Haar-like features •  Could use ensemble of different features. 23
  24. 24. Future work•  Inner confusion matrix to “guide” learning to “hard” areas of the problem•  Test with a supervised-learning LCS, e.g. UCS•  Only learn accurate positive rules, rather than complete mapping•  How to deal with outliers?•  Testing on harder image problems will likely reveal further challenges 24
  25. 25. Confusion matrix 25
  26. 26. Confusion matrix 26
  27. 27. Conclusions•  LCS can successfully work with image data.•  Autonomously learn the number, type, scale and threshold of features to use in a transparent manner.•  Challenges remain to bridge the 5% gap to supervised learning performance 27
  28. 28. Demo•  Handwritten digit classification with FPCS 28
  29. 29. Questions?
  30. 30. Basics of LCSl  For observed state s all conditions are testedl  Matching rules form match set [M]l  For every action, a reward is predictedl  An action a is chosen (random vs. best)l  Rules in [M] advocating a form action set [A]l  [A] is updated according to reward receivedl  Rule Discovery, e.g. GA, is performed in [A] to evolve better rules 30