Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. An Incremental Machine Learning Mechanism Applied to Robot Navigation Nawwaf N Kharma, Majd Alwan, and Peter Y K Cheung Department of Electrical Engineering, Imperial College of Science Technology and Medicine, London SW7 2BT U.K. Fax: + 44 171 5814419
  2. 2. Abstract In this paper we apply an incremental machine learning algorithm to the problem of robot navigation. The learning algorithm is applied to a simple robot simulation to automatically induce a list of declarative rules. The rules are pruned in order to remove the rules that are operationally useless. The final set is initially used to control the robot navigating an obstacle-free path planned in a polygonal environment with satisfactory results. Crisp conditions used in the rules are then replaced by fuzzy conditions fashioned by a human expert. The new set of rules are shown to produce better results. Keywords incremental machine learning, tripartite rules, schema, robot navigation. 1. Introduction to make it more suitable for sensorimotor robotic applications. Our learning mechanism: Both Classifier Systems and Q-Learning techniques [1] have two major common deficiencies. These are: * Aims at the automatic induction of a list of declarative rules that describe the interaction between A. The Two-Part Rule Form an agent (e.g. robot) and its environment. Production Rules (PR) (both classifiers and Q- * Is simplified and made practically useful for real- Learning rules are PR) have two parts only: A left side time applications. representing the conditions that have to be met before the right side, the action, is taken. Thus PR in this * Is amended to take into account the richness and the form are situation-action rules. The information in inherent uncertainties of the real world. schemas (see 2.2) can be coded in a two-part PR This paper has four main sections, the first represents syntax. However, for many reasons this is not suitable. the basic assumptions and terms that are needed to 1. There is evidence from animal ethology [2,3] make the algorithm work. The second section indicating that animals learn an action result describes the main algorithm itself, the third outlines association, and that this association, as a unit, is then the specific problem and experiments carried out, and linked with the context. the fourth section shows and discusses the results obtained. 2. The number of rules in PR systems is determined by the number of combinations of contexts, actions, 2. Assumptions and Terms and results that could make up a rule. This could 2.1 Basic Assumptions result in a very large number of rules. In contrast, schemas used in this paper are built incrementally and The following are the basic assumptions made, in hence require less memory. order for the main algorithm to work in line with expectations: B. Implicit Representation of Result Values * All learned information may be put in the form of a 1. Q-learning as well as Classifier Systems assign a list of declarative rules. strength to each rule that implicitly expresses the * The nature of the environment is static. The set of operational value the rule. This contrasts to schemas laws that govern the environment do not change over that have explicit declarative result components. time. * There are no hidden states. The relevant aspects of 2. The rule selection mechanism in PR systems the environment are all detectable through the robot’s chooses high strength rules, or rules that are in the sensors. vicinity of high strength ones. This means that * Crisp conditions are initially sufficient for learning. learning only takes place along the fringe of the state- * Disjunctions of conjunctions of conditions are space that has already been connected to a goal. While enough to characterise any state. animats should be allowed to seek knowledge that * Temporal Credit Allocation problem [5] may be may not have immediate use for the goal at hand. overlooked. The Learning Mechanism on the other hand is a * Actions are taken in serial. They are finite in system that learns incrementally (i.e. every rule is duration, and do not destabilise the system. built in a number of steps) using explicit units of * Relevant results can be pre-defined in terms of a representation (schemas). The algortihm aims to combination of conditions. enable robots to acquire and use sensorimotor * There are any number of agents in the environment. knowledge autonomously, without a priori knowledge Any one of them may be monitored by the learning of the skills concerned. This algorithm is based on the algorithm. Schema Mechanism developed by Drescher [4]. It was 2.2 Definiton of Terms altered and amended significantly in three main ways • Schema and Schema Space
  3. 3. result follow more reliably. Reliability of a schema is context result measured by the ratio of: the number of times that its action extended structures action is executed, in the right context, and leads to the fulfilment of its result, to the total number of times schemai that its action is executed in the right context. Fig. 1 A schema. • Configuration Parameters and others The main structure of the learning mechanism is the - Result spin-off: a new schema made out of a copy of Schema (or rule) Space. The Schema Space is the a previous one, by adding a condition to the result collection of all schemas. At any time the Schema side. Space of the robot represents all its knowledge. The job of the learning algorithm is simply to create, - Context spin-off: a new schema made out of a copy modify, delete, and possibly link schemas. of a previous one, by adding a condition to the context side. A schema representation is made of two main structures: a Main Body (which comprises of a - θ1: the relevance threshold, for producing result context, action and result), and the extended spin-offs. structures. (see Fig. 1.) The main body is a tripartite - N1: the total number of experiments that need to be rule representing a counterfactual assertion. The taken before a result spin-off is allowed. extended structures keep information that is mainly used for creating (or spinning-off) new schemas. - θ2: the reliability threshold, used for producing context spin-offs. A schema has both declarative and procedural aspects. - N2: The number of activations that a result spin-off Declaratively, it is a unit of information about an schema needs to go through before it is allowed to interaction between the robot and the environment. produce a context spin-off. Procedurally, a schema represents a possible action to be taken at situations when its context is fulfilled and 3. The Main Algorithm its result is desired. The learning mechanism is best described by explaining the main algorithm that it embodies. This The components of a schema are: main algorithm goes through the following main • Main Body: steps: - Conditions: A condition may be viewed as a 1. Randomly select an action and execute it. function representing the degree of membership (or 2. Use the data collected before, during and after D.O.M.) of a sensor's output in a set representing that taking the action of the schema in its context, to condition. In a crisp DOM case, a condition can either update the two sets of correlation statistics be true or false. 3. Based on the statistics in step 2, the rule base may be updated as detailed in the algorithmic notation. - Context, Result and Action: A Context (and 4. Repeat steps 1 to 3 above until the predetermined similarly a Result) is a conjunction of one or more number of experiments is met. conditions (and their negations.) A Result could be either predefined or created at run-time. Contexts of The two phases of rule base update is best described in reliable schemas are automatically added, at run-time, the following algorithmic notation: to the set of results. An Action represents a command If ( no of experiments > N1 to an effector to take an action. If an Action is taken AND PTC/NTC(Resulti) >= θ1 then it's command is executed. AND Resulti not used before) • Extended Structures then Result spin-off Each schema has extended structures that contain two When update is completed and once the PSC and NSC main sets of correlation statistics. These statistics are are known context spin-off takes place according to: necessary for the development of schemas. The first If (no. of experiments > N2 set contains the Positive Transition Correlation (PTC) AND PSC/NSC(Conditioni) >= θ2 and the Negative Transition Correlation (NTC) are AND Conditioni not used before) used to find relevant results of an action. A relevant then Context spin-off result of an action is a result that has empirically shown that it follows the execution of that action 4. Problem and Experiments significantly more often than other actions from the The learning algorithm is now applied to the problem robot’s repertoire of actions. The PTC discovers of robot navigation. The goal of this application is to: positive results while the NTC discovers negative ones. The second set of statistics contains the Positive * Show that the algorithm is capable to deduce a list Success Correlation (PSC) and the Negative Success of rules that is capable (if properly pruned) of Correlation (NSC). PSC is used to find conditions that controlling the navigational behaviour of the robot when included in the context of a schema, will make navigating an obstacle-free path planned in a given its result follow more reliably than before adding environment. these conditions. NSC has the same function as PSC * Investigate the results of fuzzifying the except that it is used to find conditions that need to be context/result conditions on the execution of the excluded from the context of a schema to make its deduced rule base.
  4. 4. 4.1 The Robot Simulation and the Task to Learn IF right_big ^ left_slight THEN right_small The robot has a cylindrical body, a differential drive IF right_small ^ left_slight THEN centre with two independently driven motorised wheels that IF centre ^ left_slight THEN left_small perform both the diving and steering. Four castors to support the mobile base on the flour (see Fig. 2.) IF left_small ^ left_slight THEN left_big C C They were found with different reliability values depending on the specific series of experiments taken. DW L DW R The rules produced by the learning algorithm are then pruned using the criteria of:- 1. Relevance to the goal (heading towards the goal), C C 2. High reliability. Fig. 2 A schematic of the Mobile Robot’s Base. The above rules become: Steering results from driving the left and right wheels at different speeds. This arrangement enables the robot IF right_big ^ left_slight THEN right_small to turn around its centre. IF right_small ^ left_slight THEN centre The robot is equipped with an on-board electronic With respect to rule block 1, the final list becomes compass, and odometry for localisation. (put in operational form): The robot requires two commands, linear speed and IF DirDif: right_big THEN DirOut: left_far change of direction. These are separated into individual rotational speed commands for the two IF right_big THEN left_slight driving motors, which are put in a velocity closed- IF right_small THEN left_far loop control. The global position control loop is IF right_small THEN left_slight closed by the feedback coming from the localisation system. IF centre THEN straight The task we want to learn is navigating our robot on a IF left_small THEN right_slight path consisting of straight line segments. The learnt IF left_small THEN right_far navigation rulebase should be able to control the robot to traverse the planned path smoothly. IF left_big THEN right_slight A simulation of the kinematics and dynamics of the IF left_big THEN right_far described mobile robot base was used for testing the For the second block of rules, those that are concerned learnt control rules. The robot simulation links to with the control of the linear velocity of the robot FuzzyTECH 3.1 development environment [6], where when heading towards a goal, a number of constraints the rules and the input/output membership functions is placed on the learning algorithm:- (including crisp ones) can be graphically edited. 1. Due to inertia, the robot is prevented from taking an 4.2 Experimental set-up for learning experiment in which the speed changes suddenly from The learning algorithm goes through two runs. One to slow to high or high to zero. Speed can only change discover the block of rules that are relevant to the gradually. This corresponds to real robots with orientation control. The second block contains rules dynamics, as opposed to mere kinematic simulations. that control the linear velocity of the robot. 2. We prune the first list of rules according to a different criteria (from the case with the orientation The learning algorithm is configured as follows: θ1:= block). This criteria is: 2, N1:= the total number of experiments taken, θ2:= 1, N2:= 3. Negative spin-off mechanisms are disabled. A. Highest reliability. B. Maximum distance traversal at each step. The sets of conditions and actions used are: C. Zero speed at the goal. DirDif={right_big, right_small, centre, left_small, This gives us the following list of rules: left_big}, Dist={very_near, near, medium, far}, SpIn={zero, slow, medium, high}, IF Dist: X ^ SpIn: zero THEN SpOut: slow DirOut={left_far, left_slight, straight, right_slight, IF medium ^ medium THEN medium right_far}, SpOut={ zero, slow, medium, high }. IF far ^ medium THEN high 5. Results IF far ^ slow THEN medium 5.1 Learning Algorithm Results IF far ^ high THEN high A series of experiments are fed to the learning algorithm. These experiments were chosen such that IF medium ^ high THEN medium they cover, on a uniformly random basis, the context IF medium ^ slow THEN medium space of the actions concerned. The learning IF near ^ slow THEN slow algorithm is run and a series of rules are produced. If the direction control action left_slight is taken as an IF very_near ^ slow THEN zero example we will find the following rules are produced:
  5. 5. Since the two blocks of rules are learned separately, However, when appropriate fuzzy membership separation is enforced in action. This is done via functions replace the crisp ones, the performance of adding another block of rules which makes sure that the learnt navigation rules significantly improves, as the speed rules are only active when the robot is Fig. 4 shows. This is because when the robot becomes heading towards the goal. This special block is: closer to the direction of the goal, the final output of the orientation control rules is significantly reduced IF DirOut: right_far THEN SpOut: zero according to the degree of fulfilment. IF left_far THEN zero 6. Conclusions and Recommendations IF right_slight THEN zero The learning algorithm succeeded in finding the IF left_slight THEN zero declarative rules that represent, in their totality, the This means that at execution the robot should first interaction between the robot and the environment. execute the first set making sure that robot is in the Many of these rules were operationally useless, and right direction and then the second block starts had to be pruned (according to the criteria mentioned executing. previously). Once pruned, the resulting rules (both in crisp and fuzzy forms) were effective in controlling 5.2 Simulation Results the robot in navigation. We have shown that the performance of the learnt schemas improves as the context conditions are fuzzified. Hence, our future work would be making the learning schema mechanism a fuzzy one, which would be more general and capable of learning tasks in the continuous real world. Our learning mechanism, presented in this paper, readily allows this extension. 7. References [1] Dorigo M. Et al. (1994) "A comparison of Q-learning and classifier systems." In From animals to animats 3, edited by D.Cliff et al. MIT Press, Cambridge, MA. [2] Rescorla R. (1990) "Evidence for an association between the disciminative stimulus and the response-outcome association in instrumental learning." Journal of experimental psychology: animal behavior process, 16, 326-334. Fig. 3 Navigation using Crisp Conditions. [3] Roitblat H.(1994) "Mechanism and process in animal behavior: models of animals, animals as models." In From animals to animats Fig. 3 shows the robot navigating a planned path using 3, edited by D.Cliff et al. MIT Press, Cambridge, MA. the learnt rules with crisp membership functions for [4] Drescher G. (1990) "Made-up minds: a constructivist approach to the context conditions. It is clear that the robot’s artificial intelligence." MIT Press, Cambridge, MA. centre moved off the straight path segments, due to [5] Holland J H. (1992) “Adaptation in Natural and Artificial non-overlapping between the straight and the Systems.” MIT Press, Cambridge, MA. contiguous membership functions, and to its width. [6] FuzzyTECH 3.1 Software Manuals. This is unsuitable in cluttered environments (e.g. a narrow corridor). Had the straight membership function been narrower the robot would have swung right and left of the path in a zigzag, due to the activation of exactly the same rules regardless of the required amount of direction change when change of direction is required. Fig. 4 Navigation using Fuzzy Conditions.