LiTH, 1998

             1
Abstract

   This essay deals with the studies of machine learning, an important part of computer
  science. The emphasis ...
Contents

  Introduction.....................................................................................................
Introduction

As part of the course TDDB55 - "Medieinformatik, projekt 1" at the University of Linköping I
choose to look ...
Decision tree learning

Theory

Decision tree learning is one of the most popular learning methods and has been
successful...
The Algorithm

Basically the algorithm is the assembly of a tree graph. The tree will then be used to make
decisions, henc...
http://www.salford-systems.com/

Artificial Neural Networks

Theory

Artificial neural networks have proved to be an effic...
Biology

The idea for Artificial Neural Networks originated from the studies of the brain. Since the
brain seem to have an...
are several ideas for different activation functions, but they all have in common that they
depend on whether the sum of t...
Perceptrons are single layered feed-forward networks. They were the first approach to artificial neural networks that comp...
Figure 6 ImageFinder 3.4


Evolutionary computation

Theory

Genetic algorithms and genetic programming are two closely re...
Figure 7 Reproduction in genetic algorithms

                                               Algorithm

In order to simulat...
of an individual is usually determined by executing the program on training data. Crossover is
executed through swapping r...
Figure 9 Tron - Computer winning rate

                      The evolution of the Tron program over time according to the ...
Figure 10 GA Playground’s TSP solving algorithm

  GA Playground has a very nice and adjustable user interface that allows...
Conclusion

I have throughout my personal project studied a variety of different algorithms that have
shown to been more o...
References

Books

Norvig, Peter & Russell, Stuart (1995). Artificial intelligence – a modern approach, USA

             ...
Upcoming SlideShare
Loading in …5
×

Introduction.doc

399 views
358 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
399
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction.doc

  1. 1. LiTH, 1998 1
  2. 2. Abstract This essay deals with the studies of machine learning, an important part of computer science. The emphasis is put on three major sub areas, Decision trees, Artificial Neural Networks and Evolutionary Computation. For each approach the theory behind the algorithm is explained as well as the experience that I have received when examining different implementations of the algorithms. LiTH, 1998 2
  3. 3. Contents Introduction......................................................................................................................................4 Machine Learning............................................................................................................................4 Decision tree learning.......................................................................................................................5 Theory...............................................................................................................................................................5 The Algorithm..................................................................................................................................................6 Practice.............................................................................................................................................................6 Artificial Neural Networks...............................................................................................................7 Theory...............................................................................................................................................................7 Biology.............................................................................................................................................................8 Computer..........................................................................................................................................................8 Practice...........................................................................................................................................................10 Evolutionary computation.............................................................................................................11 Theory.............................................................................................................................................................11 Genetic algorithms..........................................................................................................................................11 Genetic programming.....................................................................................................................................12 Practice...........................................................................................................................................................13 Other approaches...........................................................................................................................15 Conclusion.......................................................................................................................................16 References.......................................................................................................................................17 Books..............................................................................................................................................................17 Websites.........................................................................................................................................................17 LiTH, 1998 3
  4. 4. Introduction As part of the course TDDB55 - "Medieinformatik, projekt 1" at the University of Linköping I choose to look into the field of machine learning. To be more precise, I choose the assignment “Evaluate machine learning algorithms for user modeling”. The different algorithms I have evaluated are decision tree learning, artificial neural networks and evolutionary computation. I am also mentioning other approaches such as Bayesian networks and PAC learning. As the main source of information I have used the books Artificial intelligence – a modern approach by Peter Norvig and Stuart Russell and Machine learning by Tom M. Mitchell as well as the various enlightening sites on the Internet. Machine Learning What is machine learning? That was the first question that I faced when I started looking in to the subject. It is a fairly new science, approximately as old as computer science itself. Ever since the realization of the very first computers people have dreamed of teaching their machines into reasoning and acting like humans and other forms of intelligent life. This is where machine learning and other closely related fields such as Artificial Intelligence, AI, comes in. Machine learning is the technique of implementing algorithms that learn on computers. What then is learning? Well Tom M. Mitchell gives this definition: “A computer program is said to learn from experience E with resect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." The task may for instance be to recognize different faces of people (see Artificial Neural Network learning) and the experience would then be the set of pictures of people used to train the system. Measurement of the performance is here checking if the computer can determine who is on the picture correctly. In this case the performance would partly be evaluated by humans and fed back to the computer, but there are many examples of tasks where the performance can be measured by the computer itself e.g. learning to play chess, where it is easier to define rules for the measurement of the success of the algorithm. A fundamental part of learning is searching. By searching through a hypothesis space for the hypothesis that fits the training examples best, the algorithms can simulate a modest form of intelligence. LiTH, 1998 4
  5. 5. Decision tree learning Theory Decision tree learning is one of the most popular learning methods and has been successfully implemented in such different tasks as learning to diagnose medical cases and assessment of credit risk of loan applications. The method is best suited for problems that have discrete output values and instances with a fixed set of attributes with values that are preferably discrete and disjoint. At least in the basic form of decision tree algorithm. With modifications to the basic algorithm, decision trees can be made to handle continuous, real- valued attributes as well as outputting real-valued results. Applications with these features are less common though. Other advantages with decision trees are that they are robust to errors and that they may have missing attribute values and still function. If the set of training examples is infected by incorrect instances the algorithm is still able to make correct assumptions on the set and if all attributes are not represented, it will still work as long as the missing value is not required in the tree search. A major drawback to inductive learning through decision trees is that the algorithms are not capable of recognizing patterns in learning examples and therefor do not know anything about cases that have not yet been examined through examples. Figure 1 Occupation decision tree One possible implementation of a decision tree for trying to establish a person’s occupation. LiTH, 1998 5
  6. 6. The Algorithm Basically the algorithm is the assembly of a tree graph. The tree will then be used to make decisions, hence decision tree. The leaf-nodes of the tree represent the classifications. The rest of the nodes are test on the attribute values of the tree. An important aspect of the construction of the tree is deciding the order of the tests (the internal structure of the nodes in the tree). The most popular philosophy on this subject dates back to the 14thcentury and William of Ockham who preferred the simplest of hypothesis consequent with the examples, also known as Ockham’s razor. Ockham’s razor: Prefer the simplest hypothesis that fits all the data. In other words this means that we are interested in asking the “most interesting” questions first, or to be more scientific, to ask the questions that give us the greatest information gain. Claude Shannon, the father of information theory defined a metric for the information gain, in the 1940’s, called entropy. The entropy for each question is a value between 0 and 1, where 1 rates as the most information possible gained. Entropy (S) of a question on an attribute = Σ -p log i 2 pi, over all disjoint values of the attribute where pi is a classification (such as e.g. positive or negative) of the example. This way the question with the highest entropy, out of the remaining questions, is chosen until all relevant questions are asked (questions with entropy 0 are pointless to ask since they do not add any information to the tree). Since the entropy of each question changes with new examples to the training domain, the tree should be reconstructed for every new example. This is obviously not very practical. A better approach is to wait until a notable amount of new examples (e.g. 10%) have been added to the original set and then reconstruct. Practice When looking for implementations of algorithms using decision trees, I soon discovered that most available systems were more or less associated to data mining and expert systems. Some examples that I found were Alice from Isoft (France) and CART from Salford Systems (USA). I examined the CART system on a Windows 98 platform. I found that CART had a very nice graphic-interface that showed the decision tree. The major limitation to CART was that it only produced binary trees, but there were also many interesting parameters that could be tuned in the tree modeling process. A free demo of the CART system can be downloaded from Salford Systems’ website. LiTH, 1998 6
  7. 7. http://www.salford-systems.com/ Artificial Neural Networks Theory Artificial neural networks have proved to be an efficient approach to learning real-valued functions over both continuous and discrete-valued attributes. One of the biggest advantages with artificial neural networks is that they are robust to noise in the training data. This ability has contributed to successful implementation tasks, such as face and handwriting recognition, robot control, language translation, pronunciation software, vehicle control, etc. In a “pure” artificial neural network, all the nodes work in parallel. This requires special and expensive hardware. Most of today’s implementations of artificial neural networks are done on single-processor computers, by simulation of parallelism. This enables only a fraction of the capacity in terms of speed of a “ pure” neural network. The idea for artificial neural networks came from the science of biology. Figure 2 Artificial Neural Network Example of neural network for establishing identity of a human face on a picture. LiTH, 1998 7
  8. 8. Biology The idea for Artificial Neural Networks originated from the studies of the brain. Since the brain seem to have an unprecedented ability to learn a wide range of things, it has been an inspiring challenge to copy its characteristics. The thinking part of the brain is a vast network made up of approximately 1011 nerve cells, called neurons. Each neuron is connected to, on average, ten thousand other cells. The connections are organized in layers. Each neuron consists of a cell body, called the soma and several shorter fibers, dendrites and a long output fiber called the axon. A junction called the synapse, serves as a connection between each cell. When a signal propagates from neuron to neuron, it is first handled by the synapse that can either increase or decrease its electrochemical potential. Synapses have the ability to change the characteristics over time. This ability, researchers believe, is what we refer to as learning. The synapses then lead the signal into the cell via the dendrite. If the total potential of the cell reaches a certain threshold, an electronic pulse, also called action potential, is sent down the axon and finally on to the synapses. This is then repeated for each neuron layer in the network. The last layer is the output layer. Figure 3 Neuron – the Brain Computer The computer implementation of the neuron is the unit and the connections it uses, called links. The links each have a numeric weight. Through updating the weights the links come to have the same function as the synapses in the brain. The input function sums all the incoming signals and their associated weights. The activation function (f in figure 4) then determines whether to send an activation signal (a in figure 4) onto the output links. There LiTH, 1998 8
  9. 9. are several ideas for different activation functions, but they all have in common that they depend on whether the sum of the input function reaches the threshold or not. Figure 4 Unit – the Computer Some units are connected to the outside environment and assigned as input or output units. The rest of the units are called hidden units and serve as network layers between the input and output layer. There are two major varieties of network structures, the feed-forward and the recurrent networks. In feed-forward networks the signal only travels in one direction and there are no loops. In a recurrent network there are no such restrictions. The recurrent network is much more advanced and can hold memory, but it is also more vulnerable to chaotic behavior and instability. The brain is a recurrent network. Some other examples of successful recurrent networks are the Hopfield and the Boltzmann networks. The simplest form of feed-forward networks are the perceptrons. They do not have any hidden units. Still they are able to represent advanced functionality as AND, OR and NOT in boolean algebra. Networks with one or more hidden layers are called multilayer networks. The most popular method for learning in multilayer networks is called backpropagation. The basic idea in backpropagation is to minimize the squared error between the network output and the target values of the training example through dividing the “blame” among the contributing weights. Overfitting is an important issue in machine learning, especially so in neural network learning. Overfitting is when the networks are trained too much on a small domain of training data, which it then performs very well on, but when new data is added it can not generalize sufficiently. Figure 5 Perceptron LiTH, 1998 9
  10. 10. Perceptrons are single layered feed-forward networks. They were the first approach to artificial neural networks that computer scientist began to study in the late 1950’s. Practice I first looked at Tom M. Mitchell’s implementation of face recognition using an artificial neural network. It is made for the Unix platform and can be downloaded over the Internet (see URL below). It requires some graphic-display program in order to view the images processed by the system. The images used in the system were in the pgm format and I used XV by John Bradley to view them. The system gave an interesting input on how artificial neural networks can discover patterns in pictures for example. The main drawback with this system was that it used tiny images, only 32x30 pixels in size so it took some time to get used to. http://www.cs.cmu.edu/afs/cs/zproject/theo-11/www/decision-trees.html I also looked at a similar commercial system ImageFinder 3.4 from Attrasoft Inc. It had a very user-friendly interface and was easy to get started with. ImageFinder is a Java application that can take gif or jpeg images as input and learn their characteristics in an artificial neural network. The number of images to learn and the number of times to “practice” on them can be decided by the user. Then when the network is done, the user can specify a directory in which to search for similar images. The output is the names of the closest resembling images and their score based on how close they reassemble the training examples. Unfortunately the free demo that I could download did not allow for any adjustment of parameters which would have made it even more interesting to evaluate. http://attrasoft.com/ LiTH, 1998 10
  11. 11. Figure 6 ImageFinder 3.4 Evolutionary computation Theory Genetic algorithms and genetic programming are two closely related forms of evolutionary programming. Some authors consider these terms to be synonyms, while others chose to refer to genetic algorithms when the hypothesis or “gene” is a simple bit string and to genetic programming when the hypothesis is more advanced, usually symbolic expressions or programming code. Genetic algorithms have been successfully utilized, especially on optimization problems. Since many problems can be thought of as optimization problems, this is no limitation to its usefulness. Genetic algorithms Background Nature is the best known producer of robust and successful organisms. Over time, organisms that are not well suited for their environment die of, while others that are better suited live to reproduce. Parents form offspring, so that each new generation carry on earlier generation’s experience. If the environment changes slowly, species can adapt to the changes. Occasionally, random mutations take place. Most of these results in death for the mutated individual, but a few mutations result in new successful species. These facts were first revealed by Charles Darwin in his publication The Origin of Species on the Basis of Natural Selection. LiTH, 1998 11
  12. 12. Figure 7 Reproduction in genetic algorithms Algorithm In order to simulate evolution, the algorithm needs a metric to establish which selections are better than others in respect of solving the problem at hand. This metric is called the fitness function. The most promising individuals (the highest scores on the fitness function) then receive a higher reproduction likelihood. The next step is to decide where in the bit string to make the crossover. This is usually done randomly somewhere along the string. Then the parts from the original bit strings are swapped to form two new strings. This is the natural selection part of the algorithm. But if this had been the only step in the reproduction, most algorithms would have been able to find only local optimums. To solve this problem the algorithm incorporates another basic component of regeneration in nature, mutations. This way an individual bit string can leave a population that is “stuck”. The chance of a mutation is usually very low. Evolution algorithm 1. Chose individuals for reproduction based on fitness function. 2. Chose where to make crossover. 3. Reproduce using crossover. 4. Make mutations to single bits with small random chance. 5. Repeat from step 1. Genetic programming Genetic programming differs from genetic algorithms in that it strives to optimize code and not bit strings. Programs manipulated by a genetic programming are usually represented by trees corresponding to the parse tree of the program. Just as in genetic algorithms, the individuals produce new generations through selection, crossover and mutation. The fitness LiTH, 1998 12
  13. 13. of an individual is usually determined by executing the program on training data. Crossover is executed through swapping randomly selected subtrees. Figure 8 Crossover operation in genetic programming Practice I found several very interesting sites on evolutionary programming on the Web. Two of the best ones were Java applets on the web. One of the game Tron and the other a site called GA playground. Tron: http://dendrite.cs.brandeis.edu/tron/ The GA playground: http://www.aridolan.com/ga/gaa/gaa.html Tron is a computer game based on the 1982 Walt Disney movie with the same name. It uses a genetic algorithm in order to learn from previous games. According to the people behind the program they “... have put a genetic learning algorithm online. A “background" GA generates players by having the computer play itself. A "foreground" GA leads the evolutionary process, evaluating players by their performance against real people.” It is very hard to beat the computer at Tron at present. I only succeeded two out of approximately 50 times. The on-line game Tron is a good example of successful utilization of evolutionary computation. LiTH, 1998 13
  14. 14. Figure 9 Tron - Computer winning rate The evolution of the Tron program over time according to the authors. The other explored site, the GA Playground was similar in that it provided on-line Java applets for evaluation of algorithms. This site provided more freedom to choose different algorithms and parameters on different, user-selected, problems. One example of an interesting problem was the Travelling Salesman Problem, which was implemented in three different cases (All cities on a circle, Cities in Bavaria and Capitals of the US). These examples gave good insight on how the applet worked. LiTH, 1998 14
  15. 15. Figure 10 GA Playground’s TSP solving algorithm GA Playground has a very nice and adjustable user interface that allows for different setups. Some features require that the program is downloaded and run as an application. Other approaches There are several other interesting approaches to machine learning then the ones mentioned above. The Probably Approximately Correct (PAC) learning, is one good model for learning. The Bayesian learning model is another. It is based around Bayes theorem for calculating posterior probability P(h|D), from prior probability P(h), together with P(D) and P(D|h). Bayes theorem: A third promising model is reinforcement learning. It is closely related to dynamic programming and frequently used to solve optimization problems. The Q algorithm is an interesting example from this category. There are many more and since this is a fairly new science, even more are sure to come. LiTH, 1998 15
  16. 16. Conclusion I have throughout my personal project studied a variety of different algorithms that have shown to been more or less useful for their different objectives. Some systems have been pure genetic algorithms or pure artificial neural networks, while others have integrated different approaches in an attempt to get the best of each algorithm. Different algorithms have had different advantages and disadvantages, i.e. decision trees are better suited for discrete valued environments. I have found that accurate knowledge about the characteristics of the problem and basic knowledge about the algorithms is essential to find a good algorithm for the task. Some problems are better suited for machine learning algorithms than others. This may be because there is still a long way to go in the science of machine learning, or because some of the expectations on machine learning are too high. For instance Stuart/Norvig suggests in Artificial Intelligence – a modern approach that it might always be worth trying a simple implementation of an artificial neural network or even a genetic algorithm on a problem just to see if it will work. Our knowledge of how, for example, neural networks work is very limited, especially with recurrent networks. There are other important aspects where knowledge is important. For example the occurrence of overfitting is a trap that anyone dealing with machine learning should be aware of. Even if the algorithm is perfect, the handling of the set of training data is still a dubious matter. LiTH, 1998 16
  17. 17. References Books Norvig, Peter & Russell, Stuart (1995). Artificial intelligence – a modern approach, USA http://www.cs.berkeley.edu/~russell/aima.html Mitchell, M., Tom (1997). Machine learning. McGraw-Hill, USA. http://www.cs.cmu.edu/~tom/mlbook.html Websites http://www.isoft.fr/ http://www.salford-systems.com/ http://dendrite.cs.brandeis.edu/tron/ http://www.aridolan.com/ga/gaa/gaa.html LiTH, 1998 17

×