SlideShare a Scribd company logo
1 of 4
A MULTISTRATEGY APPROACH TO LEARNING
                       CONTROL FROM HUMAN SKILLS

                                                   Claude Sammut


                                    School of Computer Science and Engineering
                                           University of New South Wales

                                          SYDNEY AUSTRALIA 2052

                                              claude@cse.unsw.edu.au
                                        http://www.cse.unsw.edu.au/~claude




          Abstract: Behavioural cloning seeks to build models of human skill by using observations of the
          performance of a task as input to a machine learning program. Experiments in this area have
          met with some degree of success, but the methods employed to date have their limitations,
          including brittleness and opaqueness of the control rules. Some of these limitations are due to
          the simple situation-action model of control generally used and some or due to restrictions
          imposed by the language used to represent the rules. Rather than using a situation-action model
          of control, it is often more useful to have a goal-oriented model in which a set of rules is used to
          establish goals and the control rules instigate actions to achieve those goals. Furthermore, the
          learning systems presently used are not capable of representing high-level concepts that would
          allow controllers to be simpler and more robust.

          Keywords: Artificial Intelligence, Flight Control, Knowledge Acquisition, Knowledge-based
                    Control, Machine Learning, Real-time AI




                1. MOTIVATION                                   an aircraft in a flight simulator by recording the
                                                                actions of pilots during flight. The automatic
Behavioural cloning seeks to build models of human              controller was capable of take-off, climbing, turning
skill by using observations of the performance of a             and landing. Urbanˇciˇc and Bratko (1994) used a
task as input to a machine learning program.                    similar approach to build controllers for the highly
Experiments in this area have met with some degree              non-linear container crane problem. However, the
of success. Michie et al (1990) first demonstrated the          methods employed to date have their limitations,
method by recording the traces of human operators of            including brittleness and opaqueness of the control
a pole balancing simulation. The traces were input to           rules (Urbanˇciˇc et al, 1996). Some of these
a machine learning program which produced rules                 limitations are due to the simple situation-action
capable of controlling the pole and cart. A                     model of control generally used and some or due to
demonstration on a larger scale was achieved by                 restrictions imposed by the language used to represent
Sammut et al (1992) who built a control system for              the rules.
To establish the framework for the following                 strategies. Section 3 elaborates on the architecture of a
discussion, the flight control problem is first              goal-oriented control system and section 4 concludes
presented. A simple flight simulator, as can be easily       with a discussion of future research directions.
found on a workstation, is instrumented so that it
records the state of the simulation every time the
“pilot” performs an action such as moving the joy             2. MULTISTRATEGY LEARNING AND
stick, increasing or decreasing the throttle or flaps,         INDUCTIVE LOGIC PROGRAMMING
etc. The action is treated as a label, or class value, for
each record. Thus, the records may be used as                Srinivasan and Camacho (in press) describe a method
examples of what action to perform in a given                of learning high-level features for control. Their task
situation.                                                   is to learn the relationship between the roll angle and
                                                             turn radius when turning an aircraft in a flight
Decision tree induction programs such as Quinlan’s
                                                             simulator. They collected data from a number of turns
C4.5 (Quinlan, 1993) can process the examples to
                                                             performed by a human “pilot”. These data were input
produce rules capable of controlling the aircraft.
                                                             to an inductive logic programming system, Prolog
These are “situatioin-action” rules since they simply
                                                             (Muggleton, 1995). Inductive logic programming
react to the current state of the aircraft by responding
                                                             (ILP) systems are able to take advantage of
with the action that, according the data, was most
                                                             background knowledge provided prior to learning. In
typical for that state.
                                                             this case, the background knowledge includes
More than one decision tree is needed to control a           concepts describing geometric objects such as circles
complex system such as an aircraft. In Sammut et al’s        and a regression program that is able to estimate
experiments, each control action (elevators, ailerons,       parameters in linear relationships. Thus, Srinivasan
throttle, flaps) were controlled by seperate decision        and Camacho show that it is possible to recognise the
trees. Each was induced by distinguishing one action         data from a turn as following a roughly circular
as the dependent variable and repeating the process          trajectory and the regression program finds the
for each action. Furthermore, the flight was divided         coefficients of the linear equation which approximates
into stages, so that different sets of decision trees were   the relationship between the roll angle and the radius
constructed for take-off, turning, approaching the           of the turn.
runway, and landing.
                                                             In our work, we employ a new inductive logic
Decision trees are classed as propositional learning         programming system, called iProlog, which
systems because the language used to represent the           generalises the approach taken by Srinivasan in his
learned concepts is equivalent to propositional logic.       extension to Progol. iProlog is an interpreter for the
Most behavioural cloning work, to date, has been             programming language, Prolog, that has been
limited to the use of propositional learning tools or        extended by the addition of a variety of machine
neural nets. The problem with these methods is that          learning tools, including decision trees, regression
the language of the control rules is limited to              trees, back propagation and other algorithms whose
describing simple conditions on the raw attributes           output is equivalent to propositional logic. Like other
output by the plant being controlled. However, control       ILP systems, iProlog also provides tools for learning
rules may be better expressed using high-level               first-order representations that are equivalent to
attributes. For example, in piloting an aircraft, it is      Prolog programs.
more illuminating to talk about the plane following a
                                                             ILP systems encode background knowledge as Prolog
particular trajectory rather than to simply give its
                                                             programs. Thus, if a human experts understands that
coordinates, orientation and velocities at a particular
                                                             certain raw attributes measured from a plant can be
point in time.
                                                             combined into a more useful higher-level feature, this
The situation-action model of control is also                combination can be expressed in Prolog and the ILP
simplistic. A human pilot would normally only resort         system uses the program to “pre-process” the data so
to this kind of control for low-level tasks such as          that the learner can incorporate the high-level features
routine adjustments or during emergencies where              into its concept description. Since the ILP system also
there is insufficient time to think. A more robust           generates Prolog programs, the learned concepts can
controller would incorporate the goal-oriented nature        become background knowledge for future learning.
of most control tasks. One model of goal-oriented
                                                             Of all the learning paradigms presently in use, ILP
behaviour is one a set of rules is used to establish
                                                             provides the most effective means of making use of
goals and another set of control rules instigates
                                                             background knowledge. The first-order language also
actions to achieve those goals.
                                                             allows the learner to represent concepts that are much
The following section, describes current research in         more complex than anything that can be learned by a
the use of richer languages for describing control           propositional system. However, these advantages
come at a price. A richer language almost always            situation-action model of control to a goal-oriented
implies a more expensive search for the target              one. We have already noted that the flight control
concept. ILP systems are also excellent for learning        problem was decomposed into different stages and
concepts that can be learned symbolically, but are not      within each stage, actions had their own controllers. A
so capable when the concepts are numeric in nature.         further decomposition is to separate the learning of
                                                            goals from the learning of actions to achieve those
To overcome these limitations, iProlog incorporates
                                                            goals.
other the other kinds of learning, mentioned
previously. Although, these algorithms only generate
propositional representations, they can also be               3. GOAL-ORIENTED BEHAVIOURAL
expressed as clauses in a Prolog program. Thus, if it is                 CLONING
decided that a neural net, or regression, is best suited
for processing the numerical data, these can be
                                                            Behavioural clones often lack robustness because they
provided as background knowledge for the ILP
                                                            adopt a situation-action model of control. The
system.
                                                            problem is that if the state space is very large then a
Srinivasan    augmented     Progol’s     background         large number of control rules may be required to
knowledge with a regression algorithm and                   cover the space. This problem is compounded if the
geometrical descriptions of trajectories. The result        representation language is also limited, as described in
was a concept such as:                                      the previous section. Bain and Sammut (in press)
                                                            describe a goal-oriented approach to building control
roll_angle(Radius, Angle) :-                                rules.
         pos(P1, T1), pos(P2, T2), pos(P3, T3),
         before(T1, T2), before (T2, T3),                   The data from a flight are input to a learning program
         circle(P1, P2, P3, _, _, Radius),                  whose task is to identify the effects of control actions
         linear(Angle, Radius, 0.043, -19.442).             on the state variables. Another program is used to
                                                            learn to predict the desired values of the state
where P1 is the position of the aircraft at time T1 and     variables during different stages of a flight. To control
so on. The circle predicate recognises that P1, P2 and      the aircraft, we employ simple goal regression where
P3 fit a circle of radius, Radius and regression finds a    we use the rules learned in the second stage to predict
linear approximation for the relationship between           the goal values of the state variables and then use the
Radius and Angle which is:                                  rules from the first stage of learning to select the
                                                            actions that will achieve those goal values.
         Angle= 0.043× Ραδιυσ 19.442
                            −
                                                            An example of an “effects” rule is shown below:
The ‘_’ arguments for circle are “don’t cares” which
indicate that, for this problem, the centre of the circle   Elevators = -0.28    →       ElevationSpeed = 3
is irrelevant.
                                                            Elevators = -0.19    →       ElevationSpeed = 1
Thus, ILP provides a framework for linking different
types of data fitting algorithms to achieve results that,   Elevators = 0.0      →       ElevationSpeed = 0
individually, none are capable of. iProlog is a
generalisation of Srinivasan’s approach that facilitates    Elevators = 0.9      →       ElevationSpeed = -1
the addition of complex background knowledge,               This describes the effect of an elevator action on the
including other kinds of learning algorithms (Sammut,       elevation speed of the aircraft. A “goal” rules may be
in press).                                                  as follows:
The use of high-level features is particularly
                                                            Distance > -4007     →       GoalElevation = 0
important in control systems. It is clear that pilots do
not simply react, instantaneously, to a situation. They     Height > 1998        →       GoalElevation = 20
have concepts such as climbs, turns, glide-slopes, etc.
None of these concepts can be adequately captured by        Height > 1918        →       GoalElevation = 40
learning systems that do not take advantage of
background knowledge. High-level features may be            Height > 67          →       GoalElevation = 100
pre-programmed, but flexible use of background
knowledge allows the learning system to search for          Distance <= -4153 →          GoalElevation = 40
the most appropriate features.                              else                 →       GoalElevation = 20
High-level features permit structuring of a problem in
                                                            At present, the goals represented in this system are
a way that is conducive to a better solution. Another
                                                            still very simple. In this example, they indicate that at
form of structuring is to move away from a simple
some point in the flight, given by the altitude or           to make the application of background knowledge less
distance from the runway, the elevation of the aircraft      onerous for the user.
should be as shown.
The control algorithm uses “goal” rules to determine                          REFERENCES
the correct settings for the goal variables. If there is a
difference between the goal and the current value then       Bain, M., and Sammut, C. (in press). A framework for
an effects rule is selected to achieve the desired goal          behavioural cloning. In S. Muggleton, K.
value. The selection is based in the direction and               Furakawa, & D. Michie (Eds.), Machine
magnitude of the desired change.                                 Intelligence 15. Oxford University Press.
Presently, the choice of goal and effects variables is       Michie, D., Bain, M., and Hayes-Michie, J. E. (1990).
done by hand. A goal of future research is to have this          Cognitive models from subcognitive skills. In M.
selection done automatically. However, the present               Grimble, S. McGhee, & P. Mowforth (Eds.),
method results in more robust and more transparent               Knowledge-base Systems in Industrial Control.
control rules than those obtained following the                  Peter Peregrinus.
situation action-model.                                      Muggleton, S. (1995). Inverse Entailment and Progol.
                                                                 New generation Computing, 13, 245-286.
                                                             Quinlan, J. R. (1993). C4.5: Programs for Machine
                 4. CONCLUSION                                   Learning. San Mateo, CA: Morgan Kaufmann.
                                                             Sammut, C., Hurst, S., Kedzier, D., and Michie, D.
Research in behavioural cloning has demonstrated                 (1992). Learning to Fly. In D. Sleeman & P.
that it is possible to build control rules by applying           Edwards (Ed.), Proceedings of the Ninth
machine learning methods to the behavioural traces of            International Conference on Machine Learning,
human operators. However, to become a practical tool             Aberdeen: Morgan Kaufmann.
for control or for instruction, behavioural cloning          Sammut, C. (in press). Using background knowledge
must address the issues of robustness and readability            to build multistrategy learners. Machine
of the control rules.                                            Learning Journal.
                                                             Srinivasan, A and Camacho, R. Numerical Reasoning
In this paper, it has been suggested that these
                                                                 in ILP. In S. Muggleton, K. Furukawa & D.
problems can be overcome by employing learning
                                                                 Michie (Ed.), Machine Intelligence 15. Oxford
methods that provide greater structuring over the
                                                                 University Press. Forthcoming.
domain. Structure can be imposed by breaking the
                                                             Urbanˇciˇc, T., and Bratko, I. (1994). Reconstructing
learning task down to learning goals and learning
                                                                 Human Skill with Machine Learning. In A. Cohn
actions to achieve goals. Further structuring can be
                                                                 (Ed.), Proceedings of the 11th European
achieved by constructing high-level features that
                                                                 Conference on Artificial Intelligence, John Wiley
permit more compact representations.
                                                                 & Sons.
Research in goal-directed learning and in ILP suggest        Urbanˇciˇc, T., Bratko, I., and Sammut, C. (1996).
that these aims are achievable. However, there are               Learning models of control skills: phenomena,
several problems that must be overcome. These are                results and problems. In 13th IFAC World
mainly due to an over-dependence in human                        Congress, San Francisco.
assistance.
Presently, goal and effects variables must be provided
before hand. When these are known, then clearly, it is
sensible to take advantage of such knowledge.
However, when the distinction is not known, it is
important that the learning system be capable of
distinguishing these variables for itself.
ILP systems that provide the ability to use background
knowledge generally require the user to impose some
restriction on which background knowledge should be
tried. In principle, the ILP system can search the
space of possibilities for itself, but this is very
expensive without user provided constraints. Again, if
such constraints are known, then they should be used,
but if they are not, more intelligent search is required

More Related Content

Viewers also liked

cv-agnar-fall-08.doc
cv-agnar-fall-08.doccv-agnar-fall-08.doc
cv-agnar-fall-08.docbutest
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentationbutest
 
About the authors
About the authorsAbout the authors
About the authorsbutest
 
mlrev.ppt
mlrev.pptmlrev.ppt
mlrev.pptbutest
 
Conversie Optimalisatie - Thuiswinkel kennis sessie
Conversie Optimalisatie - Thuiswinkel kennis sessieConversie Optimalisatie - Thuiswinkel kennis sessie
Conversie Optimalisatie - Thuiswinkel kennis sessieSander Tamaëla
 
IntrusionDetection_w - Selamat Datang | My Public Blog
IntrusionDetection_w - Selamat Datang | My Public BlogIntrusionDetection_w - Selamat Datang | My Public Blog
IntrusionDetection_w - Selamat Datang | My Public Blogbutest
 
Attribute Interactions in Machine Learning
Attribute Interactions in Machine LearningAttribute Interactions in Machine Learning
Attribute Interactions in Machine Learningbutest
 
Professional practice 10
Professional practice 10Professional practice 10
Professional practice 10Les Bicknell
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web: butest
 
What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...butest
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Tvt opettajien käytössä Sotungin lukio ja etälukio, vantaa
Tvt opettajien käytössä Sotungin lukio ja etälukio, vantaaTvt opettajien käytössä Sotungin lukio ja etälukio, vantaa
Tvt opettajien käytössä Sotungin lukio ja etälukio, vantaaMarianna Sydänmaanlakka
 

Viewers also liked (13)

cv-agnar-fall-08.doc
cv-agnar-fall-08.doccv-agnar-fall-08.doc
cv-agnar-fall-08.doc
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
 
About the authors
About the authorsAbout the authors
About the authors
 
mlrev.ppt
mlrev.pptmlrev.ppt
mlrev.ppt
 
Conversie Optimalisatie - Thuiswinkel kennis sessie
Conversie Optimalisatie - Thuiswinkel kennis sessieConversie Optimalisatie - Thuiswinkel kennis sessie
Conversie Optimalisatie - Thuiswinkel kennis sessie
 
IntrusionDetection_w - Selamat Datang | My Public Blog
IntrusionDetection_w - Selamat Datang | My Public BlogIntrusionDetection_w - Selamat Datang | My Public Blog
IntrusionDetection_w - Selamat Datang | My Public Blog
 
Attribute Interactions in Machine Learning
Attribute Interactions in Machine LearningAttribute Interactions in Machine Learning
Attribute Interactions in Machine Learning
 
Professional practice 10
Professional practice 10Professional practice 10
Professional practice 10
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web:
 
.doc
.doc.doc
.doc
 
What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...What s an Event ? How Ontologies and Linguistic Semantics ...
What s an Event ? How Ontologies and Linguistic Semantics ...
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Tvt opettajien käytössä Sotungin lukio ja etälukio, vantaa
Tvt opettajien käytössä Sotungin lukio ja etälukio, vantaaTvt opettajien käytössä Sotungin lukio ja etälukio, vantaa
Tvt opettajien käytössä Sotungin lukio ja etälukio, vantaa
 

Similar to IFAC97.doc

Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Editor IJCATR
 
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Editor IJCATR
 
Ai in decision making capability on board unmanned aerial vehicle
Ai in decision making capability on board unmanned aerial vehicleAi in decision making capability on board unmanned aerial vehicle
Ai in decision making capability on board unmanned aerial vehicleDinesh More
 
Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...
Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...
Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...Angelo State University
 
Knowledge-based simulation model generation for control law design applied to...
Knowledge-based simulation model generation for control law design applied to...Knowledge-based simulation model generation for control law design applied to...
Knowledge-based simulation model generation for control law design applied to...Angelo State University
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agentsbutest
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...IJERA Editor
 
Discrete event systems comprise of discrete state spaces and event
Discrete event systems comprise of discrete state spaces and eventDiscrete event systems comprise of discrete state spaces and event
Discrete event systems comprise of discrete state spaces and eventNitish Nagar
 
Introduction to networks simulation
Introduction to networks simulationIntroduction to networks simulation
Introduction to networks simulationahmed L. Khalaf
 
verification of autonomous robotic system
verification of autonomous robotic systemverification of autonomous robotic system
verification of autonomous robotic systemASJAYASURYA
 
Learning of robot navigation tasks by
Learning of robot navigation tasks byLearning of robot navigation tasks by
Learning of robot navigation tasks bycsandit
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKcsandit
 
SD approach to the boarding process
SD approach to the boarding processSD approach to the boarding process
SD approach to the boarding processRenato De Leone
 
Load Balancing in Cloud using Modified Genetic Algorithm
Load Balancing in Cloud using Modified Genetic AlgorithmLoad Balancing in Cloud using Modified Genetic Algorithm
Load Balancing in Cloud using Modified Genetic AlgorithmIJCSIS Research Publications
 
Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...
Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...
Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...AIRCC Publishing Corporation
 

Similar to IFAC97.doc (20)

Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
 
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...
 
W.doc
W.docW.doc
W.doc
 
Ai in decision making capability on board unmanned aerial vehicle
Ai in decision making capability on board unmanned aerial vehicleAi in decision making capability on board unmanned aerial vehicle
Ai in decision making capability on board unmanned aerial vehicle
 
kolter2010probabilistic.pdf
kolter2010probabilistic.pdfkolter2010probabilistic.pdf
kolter2010probabilistic.pdf
 
Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...
Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...
Autonomous and cooperative robotic behavior based on fuzzy logic and genetic ...
 
Fuzzy model reference learning control (1)
Fuzzy model reference learning control (1)Fuzzy model reference learning control (1)
Fuzzy model reference learning control (1)
 
Knowledge-based simulation model generation for control law design applied to...
Knowledge-based simulation model generation for control law design applied to...Knowledge-based simulation model generation for control law design applied to...
Knowledge-based simulation model generation for control law design applied to...
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agents
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...
 
Computer simulation
Computer simulationComputer simulation
Computer simulation
 
Discrete event systems comprise of discrete state spaces and event
Discrete event systems comprise of discrete state spaces and eventDiscrete event systems comprise of discrete state spaces and event
Discrete event systems comprise of discrete state spaces and event
 
Introduction to networks simulation
Introduction to networks simulationIntroduction to networks simulation
Introduction to networks simulation
 
verification of autonomous robotic system
verification of autonomous robotic systemverification of autonomous robotic system
verification of autonomous robotic system
 
Learning of robot navigation tasks by
Learning of robot navigation tasks byLearning of robot navigation tasks by
Learning of robot navigation tasks by
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
 
.doc
.doc.doc
.doc
 
SD approach to the boarding process
SD approach to the boarding processSD approach to the boarding process
SD approach to the boarding process
 
Load Balancing in Cloud using Modified Genetic Algorithm
Load Balancing in Cloud using Modified Genetic AlgorithmLoad Balancing in Cloud using Modified Genetic Algorithm
Load Balancing in Cloud using Modified Genetic Algorithm
 
Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...
Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...
Modelling Open-Source Software Reliability Incorporating Swarm Intelligence-B...
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

IFAC97.doc

  • 1. A MULTISTRATEGY APPROACH TO LEARNING CONTROL FROM HUMAN SKILLS Claude Sammut School of Computer Science and Engineering University of New South Wales SYDNEY AUSTRALIA 2052 claude@cse.unsw.edu.au http://www.cse.unsw.edu.au/~claude Abstract: Behavioural cloning seeks to build models of human skill by using observations of the performance of a task as input to a machine learning program. Experiments in this area have met with some degree of success, but the methods employed to date have their limitations, including brittleness and opaqueness of the control rules. Some of these limitations are due to the simple situation-action model of control generally used and some or due to restrictions imposed by the language used to represent the rules. Rather than using a situation-action model of control, it is often more useful to have a goal-oriented model in which a set of rules is used to establish goals and the control rules instigate actions to achieve those goals. Furthermore, the learning systems presently used are not capable of representing high-level concepts that would allow controllers to be simpler and more robust. Keywords: Artificial Intelligence, Flight Control, Knowledge Acquisition, Knowledge-based Control, Machine Learning, Real-time AI 1. MOTIVATION an aircraft in a flight simulator by recording the actions of pilots during flight. The automatic Behavioural cloning seeks to build models of human controller was capable of take-off, climbing, turning skill by using observations of the performance of a and landing. Urbanˇciˇc and Bratko (1994) used a task as input to a machine learning program. similar approach to build controllers for the highly Experiments in this area have met with some degree non-linear container crane problem. However, the of success. Michie et al (1990) first demonstrated the methods employed to date have their limitations, method by recording the traces of human operators of including brittleness and opaqueness of the control a pole balancing simulation. The traces were input to rules (Urbanˇciˇc et al, 1996). Some of these a machine learning program which produced rules limitations are due to the simple situation-action capable of controlling the pole and cart. A model of control generally used and some or due to demonstration on a larger scale was achieved by restrictions imposed by the language used to represent Sammut et al (1992) who built a control system for the rules.
  • 2. To establish the framework for the following strategies. Section 3 elaborates on the architecture of a discussion, the flight control problem is first goal-oriented control system and section 4 concludes presented. A simple flight simulator, as can be easily with a discussion of future research directions. found on a workstation, is instrumented so that it records the state of the simulation every time the “pilot” performs an action such as moving the joy 2. MULTISTRATEGY LEARNING AND stick, increasing or decreasing the throttle or flaps, INDUCTIVE LOGIC PROGRAMMING etc. The action is treated as a label, or class value, for each record. Thus, the records may be used as Srinivasan and Camacho (in press) describe a method examples of what action to perform in a given of learning high-level features for control. Their task situation. is to learn the relationship between the roll angle and turn radius when turning an aircraft in a flight Decision tree induction programs such as Quinlan’s simulator. They collected data from a number of turns C4.5 (Quinlan, 1993) can process the examples to performed by a human “pilot”. These data were input produce rules capable of controlling the aircraft. to an inductive logic programming system, Prolog These are “situatioin-action” rules since they simply (Muggleton, 1995). Inductive logic programming react to the current state of the aircraft by responding (ILP) systems are able to take advantage of with the action that, according the data, was most background knowledge provided prior to learning. In typical for that state. this case, the background knowledge includes More than one decision tree is needed to control a concepts describing geometric objects such as circles complex system such as an aircraft. In Sammut et al’s and a regression program that is able to estimate experiments, each control action (elevators, ailerons, parameters in linear relationships. Thus, Srinivasan throttle, flaps) were controlled by seperate decision and Camacho show that it is possible to recognise the trees. Each was induced by distinguishing one action data from a turn as following a roughly circular as the dependent variable and repeating the process trajectory and the regression program finds the for each action. Furthermore, the flight was divided coefficients of the linear equation which approximates into stages, so that different sets of decision trees were the relationship between the roll angle and the radius constructed for take-off, turning, approaching the of the turn. runway, and landing. In our work, we employ a new inductive logic Decision trees are classed as propositional learning programming system, called iProlog, which systems because the language used to represent the generalises the approach taken by Srinivasan in his learned concepts is equivalent to propositional logic. extension to Progol. iProlog is an interpreter for the Most behavioural cloning work, to date, has been programming language, Prolog, that has been limited to the use of propositional learning tools or extended by the addition of a variety of machine neural nets. The problem with these methods is that learning tools, including decision trees, regression the language of the control rules is limited to trees, back propagation and other algorithms whose describing simple conditions on the raw attributes output is equivalent to propositional logic. Like other output by the plant being controlled. However, control ILP systems, iProlog also provides tools for learning rules may be better expressed using high-level first-order representations that are equivalent to attributes. For example, in piloting an aircraft, it is Prolog programs. more illuminating to talk about the plane following a ILP systems encode background knowledge as Prolog particular trajectory rather than to simply give its programs. Thus, if a human experts understands that coordinates, orientation and velocities at a particular certain raw attributes measured from a plant can be point in time. combined into a more useful higher-level feature, this The situation-action model of control is also combination can be expressed in Prolog and the ILP simplistic. A human pilot would normally only resort system uses the program to “pre-process” the data so to this kind of control for low-level tasks such as that the learner can incorporate the high-level features routine adjustments or during emergencies where into its concept description. Since the ILP system also there is insufficient time to think. A more robust generates Prolog programs, the learned concepts can controller would incorporate the goal-oriented nature become background knowledge for future learning. of most control tasks. One model of goal-oriented Of all the learning paradigms presently in use, ILP behaviour is one a set of rules is used to establish provides the most effective means of making use of goals and another set of control rules instigates background knowledge. The first-order language also actions to achieve those goals. allows the learner to represent concepts that are much The following section, describes current research in more complex than anything that can be learned by a the use of richer languages for describing control propositional system. However, these advantages
  • 3. come at a price. A richer language almost always situation-action model of control to a goal-oriented implies a more expensive search for the target one. We have already noted that the flight control concept. ILP systems are also excellent for learning problem was decomposed into different stages and concepts that can be learned symbolically, but are not within each stage, actions had their own controllers. A so capable when the concepts are numeric in nature. further decomposition is to separate the learning of goals from the learning of actions to achieve those To overcome these limitations, iProlog incorporates goals. other the other kinds of learning, mentioned previously. Although, these algorithms only generate propositional representations, they can also be 3. GOAL-ORIENTED BEHAVIOURAL expressed as clauses in a Prolog program. Thus, if it is CLONING decided that a neural net, or regression, is best suited for processing the numerical data, these can be Behavioural clones often lack robustness because they provided as background knowledge for the ILP adopt a situation-action model of control. The system. problem is that if the state space is very large then a Srinivasan augmented Progol’s background large number of control rules may be required to knowledge with a regression algorithm and cover the space. This problem is compounded if the geometrical descriptions of trajectories. The result representation language is also limited, as described in was a concept such as: the previous section. Bain and Sammut (in press) describe a goal-oriented approach to building control roll_angle(Radius, Angle) :- rules. pos(P1, T1), pos(P2, T2), pos(P3, T3), before(T1, T2), before (T2, T3), The data from a flight are input to a learning program circle(P1, P2, P3, _, _, Radius), whose task is to identify the effects of control actions linear(Angle, Radius, 0.043, -19.442). on the state variables. Another program is used to learn to predict the desired values of the state where P1 is the position of the aircraft at time T1 and variables during different stages of a flight. To control so on. The circle predicate recognises that P1, P2 and the aircraft, we employ simple goal regression where P3 fit a circle of radius, Radius and regression finds a we use the rules learned in the second stage to predict linear approximation for the relationship between the goal values of the state variables and then use the Radius and Angle which is: rules from the first stage of learning to select the actions that will achieve those goal values. Angle= 0.043× Ραδιυσ 19.442 − An example of an “effects” rule is shown below: The ‘_’ arguments for circle are “don’t cares” which indicate that, for this problem, the centre of the circle Elevators = -0.28 → ElevationSpeed = 3 is irrelevant. Elevators = -0.19 → ElevationSpeed = 1 Thus, ILP provides a framework for linking different types of data fitting algorithms to achieve results that, Elevators = 0.0 → ElevationSpeed = 0 individually, none are capable of. iProlog is a generalisation of Srinivasan’s approach that facilitates Elevators = 0.9 → ElevationSpeed = -1 the addition of complex background knowledge, This describes the effect of an elevator action on the including other kinds of learning algorithms (Sammut, elevation speed of the aircraft. A “goal” rules may be in press). as follows: The use of high-level features is particularly Distance > -4007 → GoalElevation = 0 important in control systems. It is clear that pilots do not simply react, instantaneously, to a situation. They Height > 1998 → GoalElevation = 20 have concepts such as climbs, turns, glide-slopes, etc. None of these concepts can be adequately captured by Height > 1918 → GoalElevation = 40 learning systems that do not take advantage of background knowledge. High-level features may be Height > 67 → GoalElevation = 100 pre-programmed, but flexible use of background knowledge allows the learning system to search for Distance <= -4153 → GoalElevation = 40 the most appropriate features. else → GoalElevation = 20 High-level features permit structuring of a problem in At present, the goals represented in this system are a way that is conducive to a better solution. Another still very simple. In this example, they indicate that at form of structuring is to move away from a simple
  • 4. some point in the flight, given by the altitude or to make the application of background knowledge less distance from the runway, the elevation of the aircraft onerous for the user. should be as shown. The control algorithm uses “goal” rules to determine REFERENCES the correct settings for the goal variables. If there is a difference between the goal and the current value then Bain, M., and Sammut, C. (in press). A framework for an effects rule is selected to achieve the desired goal behavioural cloning. In S. Muggleton, K. value. The selection is based in the direction and Furakawa, & D. Michie (Eds.), Machine magnitude of the desired change. Intelligence 15. Oxford University Press. Presently, the choice of goal and effects variables is Michie, D., Bain, M., and Hayes-Michie, J. E. (1990). done by hand. A goal of future research is to have this Cognitive models from subcognitive skills. In M. selection done automatically. However, the present Grimble, S. McGhee, & P. Mowforth (Eds.), method results in more robust and more transparent Knowledge-base Systems in Industrial Control. control rules than those obtained following the Peter Peregrinus. situation action-model. Muggleton, S. (1995). Inverse Entailment and Progol. New generation Computing, 13, 245-286. Quinlan, J. R. (1993). C4.5: Programs for Machine 4. CONCLUSION Learning. San Mateo, CA: Morgan Kaufmann. Sammut, C., Hurst, S., Kedzier, D., and Michie, D. Research in behavioural cloning has demonstrated (1992). Learning to Fly. In D. Sleeman & P. that it is possible to build control rules by applying Edwards (Ed.), Proceedings of the Ninth machine learning methods to the behavioural traces of International Conference on Machine Learning, human operators. However, to become a practical tool Aberdeen: Morgan Kaufmann. for control or for instruction, behavioural cloning Sammut, C. (in press). Using background knowledge must address the issues of robustness and readability to build multistrategy learners. Machine of the control rules. Learning Journal. Srinivasan, A and Camacho, R. Numerical Reasoning In this paper, it has been suggested that these in ILP. In S. Muggleton, K. Furukawa & D. problems can be overcome by employing learning Michie (Ed.), Machine Intelligence 15. Oxford methods that provide greater structuring over the University Press. Forthcoming. domain. Structure can be imposed by breaking the Urbanˇciˇc, T., and Bratko, I. (1994). Reconstructing learning task down to learning goals and learning Human Skill with Machine Learning. In A. Cohn actions to achieve goals. Further structuring can be (Ed.), Proceedings of the 11th European achieved by constructing high-level features that Conference on Artificial Intelligence, John Wiley permit more compact representations. & Sons. Research in goal-directed learning and in ILP suggest Urbanˇciˇc, T., Bratko, I., and Sammut, C. (1996). that these aims are achievable. However, there are Learning models of control skills: phenomena, several problems that must be overcome. These are results and problems. In 13th IFAC World mainly due to an over-dependence in human Congress, San Francisco. assistance. Presently, goal and effects variables must be provided before hand. When these are known, then clearly, it is sensible to take advantage of such knowledge. However, when the distinction is not known, it is important that the learning system be capable of distinguishing these variables for itself. ILP systems that provide the ability to use background knowledge generally require the user to impose some restriction on which background knowledge should be tried. In principle, the ILP system can search the space of possibilities for itself, but this is very expensive without user provided constraints. Again, if such constraints are known, then they should be used, but if they are not, more intelligent search is required