Lecture7 Ml Machines That Can Learn


Published on

Published in: Business, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lecture7 Ml Machines That Can Learn

  1. 1. Machines That Can Learn Reading: Chapter 9 from Marakas Additionally: Chapter 2 from Leake “Case Based Reasoning” – in the library short loan
  2. 2. What is Machine Learning? <ul><li>Seek to mimic operating principles of humans </li></ul><ul><li>Neural Nets purport to solve ill-structured business problems </li></ul><ul><li>Simulates human brain, process of learning </li></ul><ul><li>Other methods </li></ul><ul><ul><li>Genetic algorithms (biological evolution) </li></ul></ul><ul><ul><li>Case-based Reasoning (memory and adaptation) </li></ul></ul><ul><ul><li>But human reasoning is fraught with vagueness, ambiguity and fuzzy descriptions </li></ul></ul>
  3. 3. Fuzzy Logic and Linguistic Ambiguity <ul><li>Our language is replete with vague and imprecise concepts, and allows for conveyance of meaning through semantic approximations. </li></ul><ul><li>These approximations are useful to humans, but do not readily lend themselves to the rule-based reasoning done on computers. </li></ul><ul><li>Use of fuzzy logic is how computers handle this ambiguity. </li></ul><ul><li>Gradation/degree of membership rather than precision </li></ul>
  4. 4. The Basics of Fuzzy Logic <ul><li>In a “pure” logical comparison, the result is either false (0) or true (1) and can be stored in a binary fashion. </li></ul><ul><li>The results of a fuzzy logic operation range from 0 (absolutely false) to 1 (absolutely true), with stops in between. </li></ul><ul><li>These operations utilize functions that assign a degree of “membership” in a set. </li></ul>
  5. 5. A Simple Membership Function Example <ul><li>The “Tallness” function takes a person’s height and converts it to a numerical scale from 0 to 1. </li></ul><ul><li>Here the statement “He is Tall” is absolutely false for heights below 5 feet and absolutely true for heights above 7 feet </li></ul><ul><li>For height x >=5 and <=7, (height(x)-5)/2 </li></ul>
  6. 6. Fuzziness Versus Probability <ul><li>There are some subtle differences: </li></ul><ul><li>Probability deals with the likelihood that something has a particular property. </li></ul><ul><ul><li>Guess that property is present </li></ul></ul><ul><li>Fuzzy logic deals with the degree to which the property is present. For example, a person 6 feet in height has a .5 degree of tallness. </li></ul><ul><li>Assumes presence of property </li></ul>
  7. 7. Advantages and Limitations of Fuzzy Logic <ul><li>Advantages: </li></ul><ul><ul><li>allows for the modeling and inclusion of contradiction in a knowledge base. </li></ul></ul><ul><ul><ul><li>E.g. “tall” belongs to multiple sets </li></ul></ul></ul><ul><ul><li>also increases the system autonomy (the rules in the knowledge base function independent of each other). </li></ul></ul><ul><ul><ul><li>Compensatory as opposed to rule-based where a single rule can bring about undesirable outcomes </li></ul></ul></ul><ul><ul><li>Eliminates borderline cases </li></ul></ul><ul><ul><li>Pervade microchip processor-based appliances </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>In a highly complex system, use of fuzzy logic may become an obstacle to the verification of system reliability. </li></ul></ul><ul><ul><ul><li>Not easy to know which rules are firing / simulate sensitive rules </li></ul></ul></ul><ul><ul><li>Also, fuzzy reasoning mechanisms cannot learn from their mistakes. </li></ul></ul><ul><li>So need to develop systems that can learn or learn to forget! </li></ul>
  8. 8. Artificial Neural Networks <ul><li>First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes. </li></ul><ul><li>They have ability to model complex, yet poorly understood problems. </li></ul><ul><ul><li>In business and finance </li></ul></ul><ul><li>ANNs are simple computer-based programs whose function is to model a problem space based on trial and error. </li></ul>
  9. 9. Learning From Experience <ul><li>The process is: </li></ul><ul><ul><li>A piece of data is presented to a neural net. The ANN “guesses” an output. </li></ul></ul><ul><ul><li>The prediction is compared with the actual or correct value. If the guess was correct, no action is taken. </li></ul></ul><ul><ul><li>An incorrect guess causes the ANN to examine itself to determine which parameters to adjust. </li></ul></ul><ul><ul><li>Another piece of data is presented and the process is repeated. </li></ul></ul><ul><li>Have ability to learn </li></ul><ul><ul><li>E.g. a predictor that adjusts over time as it converges on the most accurate model </li></ul></ul>
  10. 10. Fundamentals of Neural Computing <ul><li>The basic processing element in the human nervous system is the neuron. Networks of these interconnected cells receive information from sensors in the eye, ear, etc. </li></ul><ul><li>Information received by a neuron will either excite it (and it will pass a message along the network) or will inhibit it (suppressing information flow). </li></ul><ul><li>Sensitivity can change with passing of time or gaining of experience. </li></ul><ul><ul><li>i.e. learn if used or forget if not </li></ul></ul>
  11. 11. A Neuron synapse Inputs Increasing pulse at synaptic connection results in learning or oppositely in forgetting
  12. 12. Putting a Brain in a Box <ul><li>An ANN is composed of three basic layers: </li></ul><ul><li>The input layer receives the data </li></ul><ul><li>The internal or hidden layer processes the data. </li></ul><ul><li>The output layer relays the final result of the net. </li></ul>Carry weights (synapses)
  13. 13. Inside the Neurode <ul><li>The neurode usually has multiple inputs, each input with its own weight or importance. </li></ul><ul><li>A bias input can be used to adjust output after learning has taken place. </li></ul><ul><li>The state function consolidates the weights of the various inputs into a single value. </li></ul><ul><li>The transfer function processes this state value and makes the output (resembles a “dimmer” switch) </li></ul>Output layer Weighted inputs State function Bias input (normally=1) -ve +ve -ve low moderate high Transfer function A Neurode
  14. 14. Training the Artificial Neural Network Present data to the net with known outputs and let it “guess”. If calculation is wrong, weights are adjusted. Each neurode is tested for sensitivity
  15. 15. ANN Learning Paradigms <ul><li>Goal is to find the appropriate weight settings so that the predicted classification matches the desired outcome </li></ul><ul><li>In unsupervised learning paradigms </li></ul><ul><ul><li>the ANN receives input data but not any feedback about desired results. It develops clusters of the training records based on data similarities. </li></ul></ul><ul><li>In a supervised learning paradigm </li></ul><ul><ul><li>the ANN gets to compare its guess to feedback containing the desired results. </li></ul></ul><ul><ul><li>Calculates an error between the “guess” and the desired outcome </li></ul></ul><ul><ul><li>Error is combined with a “learning rate” in an algorithm called back propagation** </li></ul></ul>** For the mathematically minded the algorithm is quoted in the chapter appendix
  16. 16. Benefits Associated with Neural Computing <ul><li>Benefits </li></ul><ul><ul><li>Avoidance of explicit programming </li></ul></ul><ul><ul><li>Reduced need for direct input from experts </li></ul></ul><ul><ul><li>ANNs are adaptable to changed inputs </li></ul></ul><ul><ul><li>ANNs are dynamic and improve with use </li></ul></ul><ul><ul><li>Able to process erroneous or incomplete data </li></ul></ul><ul><ul><li>Allows for generalization from specific information </li></ul></ul><ul><li>Limitations </li></ul><ul><ul><li>ANNs cannot “explain” their inference </li></ul></ul><ul><ul><li>The “black box” nature makes accountability and reliability issues difficult </li></ul></ul><ul><ul><li>Repetitive training process is time consuming </li></ul></ul><ul><ul><li>Highly skilled machine learning analysts and designers are still a scare resource </li></ul></ul><ul><ul><li>ANN technology pushes the limits of current hardware </li></ul></ul><ul><ul><li>ANN require “faith” be imparted to the output </li></ul></ul>
  17. 17. Genetic Algorithms and Genetically Evolved Networks <ul><li>If a problem has any solution, it suggests that there is an optimal solution somewhere (remember bounded rationality?). </li></ul><ul><ul><li>In routing problem: 25 cities offer 10 23 possible solutions </li></ul></ul><ul><ul><li>Will take around a billion years to compute! </li></ul></ul><ul><li>The field of management science has been able to tackle increasingly complex problems and find optimal solutions. </li></ul><ul><li>This success leads us to tackle problems even more complicated, creating a need for more innovative solution methods. </li></ul><ul><li>One such method is the genetic algorithm. </li></ul><ul><ul><li>Resembles the heuristic approach to problem solving </li></ul></ul>
  18. 18. Introduction to Genetic Algorithms <ul><li>Like neural nets, genetic algorithms (GA) are based on biological theory. </li></ul><ul><li>Here, however, GAs find their roots in the evolutionary theories of natural selection and adaptation. </li></ul><ul><li>The power of a GA results from the mating of two population members to produce offspring that are sometimes better than the parents. </li></ul><ul><ul><li>i.e. take the best from both! </li></ul></ul>
  19. 19. Basic Components of a Genetic Algorithm <ul><li>Uses idea of adapting to environment </li></ul><ul><li>The smallest units of information are dubbed genes, which combine into chromosomes. </li></ul><ul><ul><li>E.g. number of shares (gene) in an investment portfolio decision </li></ul></ul><ul><ul><li>Or chromosome {Miami Newark Chicago …. Dallas} </li></ul></ul><ul><li>After a GA is initialized, it uses a “fitness function” to evaluate each chromosome. </li></ul><ul><li>The GA then experiments by combining the most fit chromosomes. </li></ul><ul><li>Next, the crossover phase sees these two “good” chromosomes exchange gene information. </li></ul><ul><li>The mutated chromosomes then join the pool. </li></ul>
  20. 20. Basic Process Flow of a Genetic Algorithm
  21. 21. Decoding, Crossover, Mutation City coding Miami ….. 000 Atlanta …..001 Chicago …010 Distance coding Miami ….. 00000000 Atlanta …..10001111 Chicago …10101011 Time coding Miami ….. 010 Atlanta …..100 Chicago …100 010 10101011 100 11000101 11101011 10101011 10000101 Before crossover After crossover 11000101 11100101 Before mutation After mutation
  22. 22. Benefits and Limitations Associated With GAs <ul><li>Population size is a critical factor in the speed of finding a solution, but at least it is relatively easy to predict this speed. </li></ul><ul><li>Crossover and mutation are interesting ideas, but they should not be used too frequently (or too sparingly, either). </li></ul><ul><li>One advantage is that you are always guaranteed to come up with at least a “reasonable” solution. </li></ul><ul><li>We can also apply them to problems for which we really have no clue on how to solve. </li></ul><ul><ul><li>i.e. untrained and no explicit knowledge </li></ul></ul><ul><li>Finally, their power comes from simple concepts, not from a complicated algorithmic procedure. </li></ul>
  23. 23. Applications of ANNs and GAs <ul><li>Prediction </li></ul><ul><ul><li>Stock markets, customer behaviour </li></ul></ul><ul><ul><li>Sales, horse racing! </li></ul></ul><ul><li>Diagnosis of disease </li></ul><ul><li>Tutoring – student learning </li></ul><ul><li>Others </li></ul><ul><ul><li>Odd patterns detection </li></ul></ul><ul><ul><li>Testing beer! </li></ul></ul><ul><ul><li>And many more – see text </li></ul></ul>
  24. 24. Case Based Reasoning <ul><li>Reasoning based on previous cases or experience </li></ul><ul><ul><li>By finding similar cases to a new problem </li></ul></ul><ul><ul><li>suggests solution or an adaptation for one </li></ul></ul><ul><ul><li>Or warn of possible failure </li></ul></ul><ul><li>e.g. Planning a meal for friends, some are vegetarian </li></ul><ul><ul><li>One allergic to dairy products </li></ul></ul><ul><ul><li>Did a dish with mozzarella previously but no good for allergy </li></ul></ul><ul><ul><li>Maybe fish would be better? </li></ul></ul><ul><ul><li>You remember that your vegetarian friends eat fish </li></ul></ul><ul><ul><li>But then you remember that one of them almost fainted when you served a fish with it’s head still on! </li></ul></ul><ul><ul><li>So you think that opting for something like swordfish or tuna would be a good solution </li></ul></ul>
  25. 25. Application areas? <ul><li>Cases in law </li></ul><ul><ul><li>Old cases justifying argument fro a new one </li></ul></ul><ul><li>Cases of fault diagnosis </li></ul><ul><ul><li>Mechanical fixes at a garage </li></ul></ul><ul><li>Medical diagnosis </li></ul><ul><li>Politicians quoting cases </li></ul><ul><li>Teaching of business, law etc </li></ul><ul><li>Managerial decision-making </li></ul><ul><ul><li>Using past experience to make a new decision </li></ul></ul>
  26. 26. Cases and Indexes <ul><li>A case is a piece of contextualised knowledge representing an experience that teaches a lesson fundamental to achieving the goals of the reasoner </li></ul><ul><ul><li>Problem description, solution, outcome </li></ul></ul><ul><li>Indexes </li></ul><ul><ul><li>Like trying to find a book in a library </li></ul></ul><ul><ul><li>Important descriptors, differences </li></ul></ul><ul><ul><li>Retrieval algorithms use them </li></ul></ul><ul><ul><li>Goal of indexing is to make sure any relevant cases are accessed as appropriate </li></ul></ul>
  27. 27. How does it work? Retrieve case from library Propose solution Adapt Justify Criticise Evaluate Store Modified from Leake New problem
  28. 28. The process <ul><li>Retrieval </li></ul><ul><ul><li>Matching and ranking procedures based on important dimensions of a case (indexes) – similarity measure </li></ul></ul><ul><li>Adaptation </li></ul><ul><ul><li>Usually a case will not match exactly </li></ul></ul><ul><ul><li>Substitution may replace objects, adjust numeric values, replace inappropriate values, use other cases to suggest replacements </li></ul></ul><ul><ul><li>Transformations add, delete or replace components </li></ul></ul><ul><li>Justification </li></ul><ul><ul><li>What differentiates a new case from older ones? </li></ul></ul><ul><ul><li>University admissions may ask “in what way is this student similar to those who have done well or not so well? </li></ul></ul>
  29. 29. Learning <ul><li>Collects more cases over time and indexes them </li></ul><ul><li>Better if records solutions that worked or did not and why </li></ul><ul><li>Saves reinventing the wheel if a similar case was taken and adapted </li></ul><ul><li>Learning achieved essentially from: </li></ul><ul><ul><li>Accumulating new cases </li></ul></ul><ul><ul><li>Management of indexes </li></ul></ul>
  30. 30. Advantages of CBR <ul><li>Learns from experiences – avoids same mistakes </li></ul><ul><li>Saves time if an old solution can be used </li></ul><ul><li>Can propose solutions for domains that aren’t completely understood </li></ul><ul><li>Can warn against making same mistakes </li></ul><ul><li>Can evaluate when no algorithmic method is available </li></ul><ul><li>Can interpret open-ended concepts </li></ul><ul><li>Focuses on important parts of the problem only </li></ul>
  31. 31. Key Point Summary <ul><li>Fuzzy Logic </li></ul><ul><ul><li>Degree of membership </li></ul></ul><ul><ul><li>Different to probability </li></ul></ul><ul><li>Neural Networks </li></ul><ul><ul><li>Supervised or unsupervised </li></ul></ul><ul><ul><li>Learn through weight adjustments to inputs </li></ul></ul><ul><li>Genetic Algorithms </li></ul><ul><ul><li>Biological processes of fitness, crossover and mutation </li></ul></ul><ul><li>CBR </li></ul><ul><ul><li>Cases, indexes, adaptation, justification </li></ul></ul>