The document discusses strategies for mid-level robot control and learning. It proposes using a common language called teleo-reactive (T-R) programs to represent robot control programs that were explicitly programmed, planned, or learned. The document describes T-R programs and some preliminary experiments on learning T-R programs through reinforcement learning. It found that perceptual imperfections like noise and aliasing pose challenges for learning but that T-R programs can still be learned to achieve robot tasks. Further experimentation is needed to develop the approach.
The document provides an introduction to machine learning, including:
1) It defines machine learning and discusses how it differs from classical AI through inductive rather than deductive reasoning.
2) It outlines examples of learning tasks and systems involving tasks like playing chess or driving, with associated goals, experiences, and performance measures.
3) It discusses different ways to classify learning systems based on their goals, models, learning rules, and types of experiences like supervised vs unsupervised learning.
MelBERT: Metaphor Detection via Contextualized Late Interaction using Metapho...Sunkyung Lee
This is the official slide for the NAACL 2021 paper: MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories.
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
This document provides an overview of deep learning and natural language processing techniques. It begins with a history of machine learning and how deep learning advanced beyond early neural networks using methods like backpropagation. Deep learning methods like convolutional neural networks and word embeddings are discussed in the context of natural language processing tasks. Finally, the document proposes some graph-based approaches to combining deep learning with NLP, such as encoding language structures in graphs or using finite state graphs trained with genetic algorithms.
The document discusses the Turing Test proposed by Alan Turing in 1950 to provide an operational definition of intelligence. The Turing Test involves an interrogator asking written questions to both a human and a computer without knowing which is which. If the interrogator cannot distinguish the written responses as coming from a human or computer, then the computer is said to have passed the Turing Test. Several capabilities are needed for a computer to pass the test, including natural language processing, knowledge representation, automated reasoning, machine learning, computer vision, and robotics. Drawbacks of the Turing Test are also discussed.
1) The document proposes a model for robots to learn interactions from humans through rhythm detection without explicit reinforcement.
2) By monitoring its own motor actions, a robot can learn the rhythm of interactions with a human and use deviations from this rhythm to internally reinforce correct behaviors.
3) In experiments, the robot was able to learn associations between human gestures and its own responses through this rhythm detection during a simple imitation game with a human.
Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences", IEEE Transactions on Cognitive and Developmental Systems, Vol. 8, No. 4, pp285-297, 2016.
Command, Goal Disambiguation, Introspection, and Instruction in Gesture-Free ...Vladimir Kulyukin
The document discusses a robotic office assistant capable of gesture-free spoken dialogue with a human operator. It describes the robot's architecture, which includes a deliberation tier for knowledge processing, an execution tier for managing behaviors, and a control tier for interacting with the world through skills. The robot's knowledge representation includes a semantic network for symbolic knowledge shared between language and vision, as well as procedural knowledge in the form of behaviors and skills. The document focuses on how the robot achieves command, goal disambiguation, introspection, and instruction through dialogue without gestures.
The document discusses different types of logical reasoning systems used in artificial intelligence, including knowledge-based agents, first-order logic, higher-order logic, goal-based agents, knowledge engineering, and description logics. It provides examples of objects, properties, relations, and functions that can be represented and reasoned about logically. It also compares different approaches to logical indexing and outlines the key components and inference tasks involved in description logics.
The document provides an introduction to machine learning, including:
1) It defines machine learning and discusses how it differs from classical AI through inductive rather than deductive reasoning.
2) It outlines examples of learning tasks and systems involving tasks like playing chess or driving, with associated goals, experiences, and performance measures.
3) It discusses different ways to classify learning systems based on their goals, models, learning rules, and types of experiences like supervised vs unsupervised learning.
MelBERT: Metaphor Detection via Contextualized Late Interaction using Metapho...Sunkyung Lee
This is the official slide for the NAACL 2021 paper: MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories.
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
This document provides an overview of deep learning and natural language processing techniques. It begins with a history of machine learning and how deep learning advanced beyond early neural networks using methods like backpropagation. Deep learning methods like convolutional neural networks and word embeddings are discussed in the context of natural language processing tasks. Finally, the document proposes some graph-based approaches to combining deep learning with NLP, such as encoding language structures in graphs or using finite state graphs trained with genetic algorithms.
The document discusses the Turing Test proposed by Alan Turing in 1950 to provide an operational definition of intelligence. The Turing Test involves an interrogator asking written questions to both a human and a computer without knowing which is which. If the interrogator cannot distinguish the written responses as coming from a human or computer, then the computer is said to have passed the Turing Test. Several capabilities are needed for a computer to pass the test, including natural language processing, knowledge representation, automated reasoning, machine learning, computer vision, and robotics. Drawbacks of the Turing Test are also discussed.
1) The document proposes a model for robots to learn interactions from humans through rhythm detection without explicit reinforcement.
2) By monitoring its own motor actions, a robot can learn the rhythm of interactions with a human and use deviations from this rhythm to internally reinforce correct behaviors.
3) In experiments, the robot was able to learn associations between human gestures and its own responses through this rhythm detection during a simple imitation game with a human.
Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences", IEEE Transactions on Cognitive and Developmental Systems, Vol. 8, No. 4, pp285-297, 2016.
Command, Goal Disambiguation, Introspection, and Instruction in Gesture-Free ...Vladimir Kulyukin
The document discusses a robotic office assistant capable of gesture-free spoken dialogue with a human operator. It describes the robot's architecture, which includes a deliberation tier for knowledge processing, an execution tier for managing behaviors, and a control tier for interacting with the world through skills. The robot's knowledge representation includes a semantic network for symbolic knowledge shared between language and vision, as well as procedural knowledge in the form of behaviors and skills. The document focuses on how the robot achieves command, goal disambiguation, introspection, and instruction through dialogue without gestures.
The document discusses different types of logical reasoning systems used in artificial intelligence, including knowledge-based agents, first-order logic, higher-order logic, goal-based agents, knowledge engineering, and description logics. It provides examples of objects, properties, relations, and functions that can be represented and reasoned about logically. It also compares different approaches to logical indexing and outlines the key components and inference tasks involved in description logics.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
The document discusses the use of pattern recognition and machine learning in computer games. It provides examples of how pattern recognition has been used in games like Black & White, Command & Conquer: Renegade, and Re-Volt. It also discusses limitations of machine learning, including the cost of acquiring, storing, and using knowledge as well as overfitting issues. The document categorizes problems by their level of decision making (strategic, tactical, operational) and examines which machine learning algorithms like genetic algorithms, particle swarm optimization, and hidden Markov models are best suited for different levels.
Main single agent machine learning algorithmsbutest
This document summarizes several machine learning algorithms and their potential applications to multi-agent systems. It describes algorithms such as decision trees, neural networks, Bayesian methods, reinforcement learning, inductive logic programming, case-based reasoning, support vector machines, and genetic algorithms. For each algorithm, it provides a brief description and discusses any existing or potential work applying the algorithm to multi-agent domains.
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses various topics related to evolutionary computation and artificial intelligence, including:
- Evolutionary computation concepts like genetic algorithms, genetic programming, evolutionary programming, and swarm intelligence approaches like ant colony optimization and particle swarm optimization.
- The use of intelligent agents in artificial intelligence and differences between single and multi-agent systems.
- Soft computing techniques involving fuzzy logic, machine learning, probabilistic reasoning and other approaches.
- Specific concepts discussed in more depth include genetic algorithms, genetic programming, swarm intelligence, ant colony optimization, and metaheuristics.
This slide includes :
Types of Machine Learning
Supervised Learning
Brain
Neuron
Design a Learning System
Perspectives
Issues in Machine Learning
Learning Task
Learning as Search
Hypothesis
Version Spaces
Candidate elimination algorithm
linear Discriminant
Perception
Linear Separability
Linear Regression
Unsupervised Learning
Reinforcement Learning
Evolutionary Learning
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
A tutorial on deep learning at icml 2013Philip Zheng
This document provides an overview of deep learning presented by Yann LeCun and Marc'Aurelio Ranzato at an ICML tutorial in 2013. It discusses how deep learning learns hierarchical representations through multiple stages of non-linear feature transformations, inspired by the hierarchical structure of the mammalian visual cortex. It also compares different types of deep learning architectures and training protocols.
Proposed Role Of Artificial Programmers (APs), perspective digital eraSM NAZMUS SALEHIN
The document discusses the potential role of artificial programmers (APs) in the digital era. It notes that while human programmers play an important role, they have limitations like tiredness, errors and emotion. The project aims to analyze the programming abilities of humans and hypothesize how APs could complement human programmers. It proposes that APs could utilize artificial language processors, knowledge bases, machine learning and search strategies to program in human-like ways. Overall, the document argues that as the world becomes more digital, APs may take on a greater programming role, though humans will remain natural beings.
The document discusses different methods for polynomial interpolation of discrete data points:
1) Polynomial interpolation uses polynomials of degree n-1 to interpolate n data points. It describes monomial, Lagrange, and Newton bases for representing the interpolating polynomial.
2) The Lagrange basis functions are products of linear factors that ensure the basis functions take value 1 at exactly one data point and 0 at all others.
3) The Newton basis functions are defined recursively as products of differences between the independent variable and data points.
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
This document discusses neural network models for natural language processing tasks like machine translation. It describes how recurrent neural networks (RNNs) were used initially but had limitations in capturing long-term dependencies and parallelization. The encoder-decoder framework addressed some issues but still lost context. Attention mechanisms allowed focusing on relevant parts of the input and using all encoded states. Transformers replaced RNNs entirely with self-attention and encoder-decoder attention, allowing parallelization while generating a richer representation capturing word relationships. This revolutionized NLP tasks like machine translation.
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses machine learning paradigms including supervised learning, unsupervised learning, clustering, artificial neural networks, and more. It then discusses how supervised machine learning works using labeled training data for tasks like classification and regression. Unsupervised learning is described as using unlabeled data to find patterns and group data. Semi-supervised learning uses some labeled and some unlabeled data. Reinforcement learning provides rewards or punishments to achieve goals. Inductive learning infers functions from examples to make predictions for new examples.
Deep learning is receiving phenomenal attention due to breakthrough results in several AI tasks and significant research investment by top technology companies like Google, Facebook, Microsoft, IBM. For someone who has not been introduced to this technology, it may be daunting to learn several concepts such as feature learning, Restricted Boltzmann Machines, Autoencoders, etc all at once and start applying it to their own AI applications. This presentation is the first of several in this series that is intended at practitioners.
This document describes a talk on emergent behavior in distributed computing systems. It discusses using a simplified model of tokens moving on rings to represent physical processes like industrial control systems. The talk presents a problem of designing a local algorithm to minimize distance between tokens on rings without communication between nodes. It proposes a protocol where some nodes introduce delays to control token movement and ensure minimum separation. The document explores analyzing properties of this protocol, such as whether it always converges and guarantees minimum distances between tokens.
Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. Specific applications of AI include expert systems, natural language processing, speech recognition, and machine vision. The document discusses what intelligence and artificial intelligence are, provides definitions and examples of artificial intelligence, and explains how artificial intelligence works through machine learning algorithms. It also covers the goals, history, and advantages of artificial intelligence.
Machine learning is a subfield of computer science that allows computers to learn without being explicitly programmed. It builds models from sample data to make data-driven predictions or decisions. Machine learning is used for tasks like spam filtering, intrusion detection, optical character recognition, and computer vision. It involves supervised learning from labeled examples, unsupervised learning to find patterns in unlabeled data, and reinforcement learning where a system interacts with an environment and receives rewards or punishments. The goal is for the learning system to generalize from its training data to perform accurately on new examples.
This document provides an overview of deep learning basics for natural language processing (NLP). It discusses the differences between classical machine learning and deep learning, and describes several deep learning models commonly used in NLP, including neural networks, recurrent neural networks (RNNs), encoder-decoder models, and attention models. It also provides examples of how these models can be applied to tasks like machine translation, where two RNNs are jointly trained on parallel text corpora in different languages to learn a translation model.
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
This document summarizes a presentation on baseline speaker verification. It discusses preprocessing speech signals using voice activity detection, extracting mel-frequency cepstral coefficients as features, building Gaussian mixture models during enrollment and testing phases, and evaluating performance using equal error rates. The authors' future plans include generating more training data synthetically and validating their results using i-vector based speaker verification.
A robot may need to use a tool to solve a complex problem. Currently, tool use must be pre-programmed by a human. However, this is a difficult task and can be helped if the robot is able to learn how to use a tool by itself. Most of the work in tool use learning by a robot is done using a feature-based representation. Despite many successful results, this representation is limited in the types of tools and tasks that can be handled. Furthermore, the complex relationship between a tool and other world objects cannot be captured easily. Relational learning methods have been proposed to overcome these weaknesses [1, 2]. However, they have only been evaluated in a sensor-less simulation to avoid the complexities and uncertainties of the real world. We present a real world implementation of a relational tool use learning system for a robot. In our experiment, a robot requires around ten examples to learn to use a hook-like tool to pull a cube from a narrow tube.
This paper reports results of artificial neural network for robot navigation tasks. Machine
learning methods have proven usability in many complex problems concerning mobile robots
control. In particular we deal with the well-known strategy of navigating by “wall-following”.
In this study, probabilistic neural network (PNN) structure was used for robot navigation tasks.
The PNN result was compared with the results of the Logistic Perceptron, Multilayer
Perceptron, Mixture of Experts and Elman neural networks and the results of the previous
studies reported focusing on robot navigation tasks and using same dataset. It was observed the
PNN is the best classification accuracy with 99,635% accuracy using same dataset.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
The document discusses the use of pattern recognition and machine learning in computer games. It provides examples of how pattern recognition has been used in games like Black & White, Command & Conquer: Renegade, and Re-Volt. It also discusses limitations of machine learning, including the cost of acquiring, storing, and using knowledge as well as overfitting issues. The document categorizes problems by their level of decision making (strategic, tactical, operational) and examines which machine learning algorithms like genetic algorithms, particle swarm optimization, and hidden Markov models are best suited for different levels.
Main single agent machine learning algorithmsbutest
This document summarizes several machine learning algorithms and their potential applications to multi-agent systems. It describes algorithms such as decision trees, neural networks, Bayesian methods, reinforcement learning, inductive logic programming, case-based reasoning, support vector machines, and genetic algorithms. For each algorithm, it provides a brief description and discusses any existing or potential work applying the algorithm to multi-agent domains.
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses various topics related to evolutionary computation and artificial intelligence, including:
- Evolutionary computation concepts like genetic algorithms, genetic programming, evolutionary programming, and swarm intelligence approaches like ant colony optimization and particle swarm optimization.
- The use of intelligent agents in artificial intelligence and differences between single and multi-agent systems.
- Soft computing techniques involving fuzzy logic, machine learning, probabilistic reasoning and other approaches.
- Specific concepts discussed in more depth include genetic algorithms, genetic programming, swarm intelligence, ant colony optimization, and metaheuristics.
This slide includes :
Types of Machine Learning
Supervised Learning
Brain
Neuron
Design a Learning System
Perspectives
Issues in Machine Learning
Learning Task
Learning as Search
Hypothesis
Version Spaces
Candidate elimination algorithm
linear Discriminant
Perception
Linear Separability
Linear Regression
Unsupervised Learning
Reinforcement Learning
Evolutionary Learning
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
A tutorial on deep learning at icml 2013Philip Zheng
This document provides an overview of deep learning presented by Yann LeCun and Marc'Aurelio Ranzato at an ICML tutorial in 2013. It discusses how deep learning learns hierarchical representations through multiple stages of non-linear feature transformations, inspired by the hierarchical structure of the mammalian visual cortex. It also compares different types of deep learning architectures and training protocols.
Proposed Role Of Artificial Programmers (APs), perspective digital eraSM NAZMUS SALEHIN
The document discusses the potential role of artificial programmers (APs) in the digital era. It notes that while human programmers play an important role, they have limitations like tiredness, errors and emotion. The project aims to analyze the programming abilities of humans and hypothesize how APs could complement human programmers. It proposes that APs could utilize artificial language processors, knowledge bases, machine learning and search strategies to program in human-like ways. Overall, the document argues that as the world becomes more digital, APs may take on a greater programming role, though humans will remain natural beings.
The document discusses different methods for polynomial interpolation of discrete data points:
1) Polynomial interpolation uses polynomials of degree n-1 to interpolate n data points. It describes monomial, Lagrange, and Newton bases for representing the interpolating polynomial.
2) The Lagrange basis functions are products of linear factors that ensure the basis functions take value 1 at exactly one data point and 0 at all others.
3) The Newton basis functions are defined recursively as products of differences between the independent variable and data points.
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
This document discusses neural network models for natural language processing tasks like machine translation. It describes how recurrent neural networks (RNNs) were used initially but had limitations in capturing long-term dependencies and parallelization. The encoder-decoder framework addressed some issues but still lost context. Attention mechanisms allowed focusing on relevant parts of the input and using all encoded states. Transformers replaced RNNs entirely with self-attention and encoder-decoder attention, allowing parallelization while generating a richer representation capturing word relationships. This revolutionized NLP tasks like machine translation.
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
The document discusses machine learning paradigms including supervised learning, unsupervised learning, clustering, artificial neural networks, and more. It then discusses how supervised machine learning works using labeled training data for tasks like classification and regression. Unsupervised learning is described as using unlabeled data to find patterns and group data. Semi-supervised learning uses some labeled and some unlabeled data. Reinforcement learning provides rewards or punishments to achieve goals. Inductive learning infers functions from examples to make predictions for new examples.
Deep learning is receiving phenomenal attention due to breakthrough results in several AI tasks and significant research investment by top technology companies like Google, Facebook, Microsoft, IBM. For someone who has not been introduced to this technology, it may be daunting to learn several concepts such as feature learning, Restricted Boltzmann Machines, Autoencoders, etc all at once and start applying it to their own AI applications. This presentation is the first of several in this series that is intended at practitioners.
This document describes a talk on emergent behavior in distributed computing systems. It discusses using a simplified model of tokens moving on rings to represent physical processes like industrial control systems. The talk presents a problem of designing a local algorithm to minimize distance between tokens on rings without communication between nodes. It proposes a protocol where some nodes introduce delays to control token movement and ensure minimum separation. The document explores analyzing properties of this protocol, such as whether it always converges and guarantees minimum distances between tokens.
Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. Specific applications of AI include expert systems, natural language processing, speech recognition, and machine vision. The document discusses what intelligence and artificial intelligence are, provides definitions and examples of artificial intelligence, and explains how artificial intelligence works through machine learning algorithms. It also covers the goals, history, and advantages of artificial intelligence.
Machine learning is a subfield of computer science that allows computers to learn without being explicitly programmed. It builds models from sample data to make data-driven predictions or decisions. Machine learning is used for tasks like spam filtering, intrusion detection, optical character recognition, and computer vision. It involves supervised learning from labeled examples, unsupervised learning to find patterns in unlabeled data, and reinforcement learning where a system interacts with an environment and receives rewards or punishments. The goal is for the learning system to generalize from its training data to perform accurately on new examples.
This document provides an overview of deep learning basics for natural language processing (NLP). It discusses the differences between classical machine learning and deep learning, and describes several deep learning models commonly used in NLP, including neural networks, recurrent neural networks (RNNs), encoder-decoder models, and attention models. It also provides examples of how these models can be applied to tasks like machine translation, where two RNNs are jointly trained on parallel text corpora in different languages to learn a translation model.
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
This document summarizes a presentation on baseline speaker verification. It discusses preprocessing speech signals using voice activity detection, extracting mel-frequency cepstral coefficients as features, building Gaussian mixture models during enrollment and testing phases, and evaluating performance using equal error rates. The authors' future plans include generating more training data synthetically and validating their results using i-vector based speaker verification.
A robot may need to use a tool to solve a complex problem. Currently, tool use must be pre-programmed by a human. However, this is a difficult task and can be helped if the robot is able to learn how to use a tool by itself. Most of the work in tool use learning by a robot is done using a feature-based representation. Despite many successful results, this representation is limited in the types of tools and tasks that can be handled. Furthermore, the complex relationship between a tool and other world objects cannot be captured easily. Relational learning methods have been proposed to overcome these weaknesses [1, 2]. However, they have only been evaluated in a sensor-less simulation to avoid the complexities and uncertainties of the real world. We present a real world implementation of a relational tool use learning system for a robot. In our experiment, a robot requires around ten examples to learn to use a hook-like tool to pull a cube from a narrow tube.
This paper reports results of artificial neural network for robot navigation tasks. Machine
learning methods have proven usability in many complex problems concerning mobile robots
control. In particular we deal with the well-known strategy of navigating by “wall-following”.
In this study, probabilistic neural network (PNN) structure was used for robot navigation tasks.
The PNN result was compared with the results of the Logistic Perceptron, Multilayer
Perceptron, Mixture of Experts and Elman neural networks and the results of the previous
studies reported focusing on robot navigation tasks and using same dataset. It was observed the
PNN is the best classification accuracy with 99,635% accuracy using same dataset.
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKcsandit
This paper reports results of artificial neural network for robot navigation tasks. Machine
learning methods have proven usability in many complex problems concerning mobile robots
control. In particular we deal with the well-known strategy of navigating by “wall-following”.
In this study, probabilistic neural network (PNN) structure was used for robot navigation tasks.
The PNN result was compared with the results of the Logistic Perceptron, Multilayer
Perceptron, Mixture of Experts and Elman neural networks and the results of the previous
studies reported focusing on robot navigation tasks and using same dataset. It was observed the
PNN is the best classification accuracy with 99,635% accuracy using same dataset.
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKcscpconf
This paper reports results of artificial neural network for robot navigation tasks. Machine learning methods have proven usability in many complex problems concerning mobile robots
control. In particular we deal with the well-known strategy of navigating by “wall-following”. In this study, probabilistic neural network (PNN) structure was used for robot navigation tasks.
The PNN result was compared with the results of the Logistic Perceptron, Multilayer Perceptron, Mixture of Experts and Elman neural networks and the results of the previous
studies reported focusing on robot navigation tasks and using same dataset. It was observed the PNN is the best classification accuracy with 99,635% accuracy using same dataset.
Gives an Introduction to Deep learning, What can you achieve with deep learning. What is deep learning's relationship with machine learning. Technical basics of working of deep learning. Introduction to LSTM. How LSTM can be used for Text classification. Results obtained.. Practical recommendations.
Concurrency Issues in Object-Oriented ModelingIRJET Journal
This document discusses concurrency issues in object-oriented modeling. It begins with an abstract that introduces the topic of finding a synthesis between concurrency and object models by analyzing representative concurrent object-oriented languages. The document then provides background on concurrency and object-oriented programming individually before discussing how they intersect and the issues that arise when combining them. Key concepts of concurrency like activities, parallelism, and communication are defined. Common language constructs for concurrency like co-routines and threads are also introduced.
This document discusses using a multistrategy approach to learn control skills from human demonstrations. It proposes using goal-oriented models rather than simple situation-action models to increase robustness. Inductive logic programming is used to incorporate background knowledge and learn higher-level concepts. The approach separates learning goals from learning actions to achieve goals. This results in more transparent control rules compared to situation-action models. Future work includes automating the selection of goal and effect variables.
1. The document discusses using a multistrategy approach to learn control strategies from human skills by observing human pilots perform tasks in a flight simulator.
2. It describes limitations with previous situation-action models of control, which react only to the current state, and proposes using a goal-oriented model where rules establish goals and other rules take actions to achieve those goals.
3. The document outlines using inductive logic programming to learn higher-level concepts and control strategies from data, which allows for more robust controllers compared to previous propositional approaches.
An Implementation on Effective Robot Mission under Critical Environemental Co...IJERA Editor
Software engineering is a field of engineering, for designing and writing programs for computers or other electronic devices. A software engineer, or programmer, writes software (or changes existing software) and compiles software using methods that make it better quality. Is the application of engineering to the design, development, implementation, testingand main tenance of software in a systematic method. Now a days the robotics are also plays an important role in present automation concepts. But we have several challenges in that robots when they are operated in some critical environments. Motion planning and task planning are two fundamental problems in robotics that have been addressed from different perspectives. For resolve this there are Temporal logic based approaches that automatically generate controllers have been shown to be useful for mission level planning of motion, surveillance and navigation, among others. These approaches critically rely on the validity of the environment models used for synthesis. Yet simplifying assumptions are inevitable to reduce complexity and provide mission-level guarantees; no plan can guarantee results in a model of a world in which everything can go wrong. In this paper, we show how our approach, which reduces reliance on a single model by introducing a stack of models, can endow systems with incremental guarantees based on increasingly strengthened assumptions, supporting graceful degradation when the environment does not behave as expected, and progressive enhancement when it does.
Understanding human activity and behavior, especially real-time understanding of human activity and behavior in video streams, is currently one of the most active areas of research in computer vision and artificial intelligence. Its purpose is to automatically detect, track and describe human activities in a series of picture frames. To recognize, detect and classify the activity of the human many applications have been developed with human centered monitoring. Human activity recognition is one of the important technology to monitor the dynamism of a person and this can be attained with the support of Machine learning techniques. Threshold-based algorithm is simpler and faster which is often applied to recognize the human activity. But Machine algorithm provides the reliable result. Experiment results proved that thep; k-NN accuracy 95% in VIT_GIT and GIT- base model caption.
This document summarizes a paper that proposes a general framework for machine learning of motor skills in robots. It discusses three key components: (1) representation of motor skills using dynamic motor primitives, (2) learning algorithms like natural actor-critic and reward-weighted regression to learn and improve the motor primitives, and (3) execution of skills on robot systems by mapping primitives to motor commands. The framework separates learning of motor tasks from real-time control, allowing long-term learning from demonstrations and reinforcement as well as fast policy improvement. It is evaluated in simulations of robot arm control tasks and a physical hitting task.
The document presents a lifelong federated reinforcement learning (LFRL) architecture for navigation in cloud robotic systems. LFRL allows robots to fuse their experience and transfer knowledge so they can effectively use prior knowledge and quickly adapt to new environments. It proposes a knowledge fusion algorithm to upgrade a shared model on the cloud by fusing private models from robots. It also introduces effective transfer learning methods to help robots rapidly adapt to new environments. Experiments show LFRL improves the efficiency of reinforcement learning for robot navigation. A cloud robotic navigation website is also presented to demonstrate LFRL.
There are formalisms in literature where behavior can be described by a finite set of rules that
maps the current device’s state into a new one like the finite state machines, statecharts and
petri nets. Those formalisms are named as rule-driven devices. A formal device is said to be
adaptive if its behavior changes dynamically in response just to its input stimuli and its current
state without any external help. Adaptive rule-driven devices can be used for modeling complex
problems in artificial intelligence, natural language, reactive systems, synchronous system and
others applications. General game playing (GGP) is a research subfield of Artificial
Intelligence which aims at developing systems able to play a variety of games, knowing and
understanding the rules only in execution time. Most of the proposed GGP systems are based on
statistic methods. This paper aims at a new approach in GGP systems, proposing a solution
based on adaptive technology modeling. In brief, a GGP system has been developed in a way
that it changes its behavior dynamically in response to rules information and its history of
games played. This proposal is presented here as well as results of some games played
comparing their strategies approach.
Artificial-Intelligence--AI And ES Nowledge Base SystemsJim Webb
This document discusses teaching artificial intelligence concepts to students. It recommends using hands-on exercises and group work to effectively introduce topics. It also provides sample questions to test understanding of knowledge-based systems and expert systems, including their components, development, applications, benefits, and limitations.
ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEMS KNOWLEDGE-BASED SYSTEMS TEACHING ...Arlene Smith
This document discusses teaching artificial intelligence concepts to students. It recommends using hands-on exercises and group work to effectively introduce topics. It also provides sample questions to test understanding of knowledge-based systems and expert systems, including their components, development, applications, benefits, and limitations.
Hot Topics in Machine Learning for Research and ThesisWriteMyThesis
Machine Learning is a hot topic for research for research. There are various good thesis topics in Machine Learning. Writemythesis provides thesis in Machine Learning along with proper guidance in this field. Find the list of thesis topics in this document.
http://www.writemythesis.org/master-thesis-topics-in-machine-learning/
CONSIDERATION OF HUMAN COMPUTER INTERACTION IN ROBOTIC FIELD ijcsit
This document discusses considerations for improving human-robot interaction through applying principles of human-computer interaction. It summarizes several existing models of human-computer interaction and adapts the action theory model to the context of human-robot interaction. The adapted model incorporates the simulation of robot emotions and awareness of human emotions into the interaction process. The summary then provides examples of how this adapted model could be applied to scenarios of social interaction between humans and robots.
Universal Artificial Intelligence for Intelligent Agents: An Approach to Supe...IOSR Journals
This document proposes a methodology to develop intelligent agents with universal artificial intelligence (UAI) that can operate effectively in new environments. The methodology uses a neuro-fuzzy system combined with a hidden Markov model (HMM) to provide agents with learning capabilities and the ability to make decisions in unknown environments. The neuro-fuzzy system would extract fuzzy rules and membership functions from data to guide an agent. The HMM would generate sequences of sensed states to model dynamic environments. This approach aims to create "super intelligent agents" that can perform human-level tasks in any computable environment without reprogramming. A literature review found that neuro-fuzzy and HMM methods have been successfully used for mobile robot obstacle avoidance and human motion recognition.
This paper describes an innovative computer system framework supporting robot automation programming, by a gesture controlled “show and tell” process, whereby human experts describe their goals of a process to be learned, and demonstrate via positional sensors(such as Microsoft Kinect or Leap Motion) the actions necessary to achieve those goals. These inputs are collected and optimized by the robot mentoring process, interpreted back to the human expert for confirmation feedback and/or subsequent fine tuning. This framework, is a pedagogical model, modelled on a professional human process needed to mentor others. We have applied this concept in an innovative way to mentor robots with gesture control and machine learning
This document discusses the advantages of using fuzzy logic in cognitive radio systems. Fuzzy logic is a form of artificial intelligence that can be used to help cognitive radios dynamically access spectrum in an intelligent manner. Fuzzy logic provides a simple way to map inputs like spectrum availability to outputs like which frequency band a radio can use. It allows for imprecise and uncertain inputs that are common in the radio environment. Fuzzy logic techniques can help cognitive radios learn about their environment and adapt transmissions with objectives of reliable communication, efficient spectrum utilization and minimal interference between users.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
1. Learning Strategies for Mid-Level Robot Control:
Some Preliminary Considerations and Experiments
Nils J. Nilsson
Robotics Laboratory
Department of Computer Science
Stanford University
Stanford, CA 94305
http://robotics.stanford.edu/users/nilsson/bio.html
nilsson@cs.stanford.edu
Draft of May 11, 2000
ABSTRACT
Versatile robots will need to be programmed, of course. But beyond explicit
programming by a programmer, they will need to be able to plan how to perform new
tasks and how to perform old tasks under new circumstances. They will also need to be
able to learn.
In this article, I concentrate on two types of learning, namely supervised learning and
reinforcement learning of robot control programs. I argue also that it would be useful for
all of these programs, those explicitly programmed, those planned, and those learned, to
be expressed in a common language. I propose what I think is a good candidate for such
a language, namely the formalism of teleo-reactive (T-R) programs. Most of the article
deals with the matter of learning T-R programs. I assert that such programs are PAC
learnable and then describe some techniques for learning them and the results of some
preliminary learning experiments. The work on learning T-R programs is in a very early
stage, but I think enough has been started to warrant further development and
experimentation. For that reason I make this article available on the web, but I caution
readers about the tentative nature of this work. I solicit comments and suggestions at:
nilsson@cs.stanford.edu.
I. Three-Level Robot Architectures
Architectures for the control of robots and other agents are often stratified into three
levels. Working up from the motors and sensors, the servo level is in direct sensory
control of effectors and uses various conventional and advanced control-theory
mechanisms---sometimes implemented directly in hardware circuitry. Next, what I call
the teleo-reactive level organizes the sequencing of servo-level actions so that they
robustly react to unforeseen and changing environmental conditions in a goal-directed
manner. Control at this level is usually implemented as computer programs that attempt
to satisfy sub-goals specified by the level above. The top level, the strategic level,
creates plans to satisfy user-specified goals. One of the earliest examples of this three-
level control architecture was that used in Shakey, the SRI robot (Nilsson, 1984). There
are several other examples as well (Connell, 1992).
2. There are various ways of implementing control at these levels---some of which support
adaptive and learning abilities. I am concerned here primarily with the middle, teleo-
reactive, level and with techniques by which programs at this level can learn. Among the
proposals for teleo-reactive control are conventional computer programs with interrupt
and sensor-polling mechanisms, so-called “behavior-based” control programs, neural
networks (usually implemented on computers), finite-state machines using explicit state
tables, and production-rule-like systems, such as the so-called “teleo-reactive” programs
(Nilsson, 1994).
Some sort of adaptivity or machine learning seems desirable, possibly even required, for
robust performance in dynamic, unpredictable environments. Two major kinds of
learning regimes have been utilized. One is supervised learning, in which each datum in
a specially gathered collection of sensory input data is paired with an action response
known to be appropriate for that particular datum. This set of input/response pairs is
called the training set. Learning is accomplished by adjusting the control mechanism so
that it produces (either exactly or approximately) the correct action for each input in the
training set.
The other type, reinforcement learning, involves giving occasional positive or negative
“rewards” to the agent while it is actually performing a task. The learning process
attempts to modify the control system in such a way that long-term rewards are
maximized (without necessarily knowing for any input what is the guaranteed best
action).
In one kind of supervised learning, the controller attempts to mimic the input/output
behavior of a “teacher” who is skilled in the performance of the task being learned. This
type is sometimes called behavioral cloning (Michie, et al., 1990; Sammut, et al., 1992;
Urbancic & Bratko, 1994). A familiar example is the automobile-steering system called
ALVINN (Pomerleau, 1993). There, a neural network connected to a television camera
is trained to mimic the behavior of a human steering an automobile along various kinds
of roads.
Perhaps the most compelling examples of reinforcement learning are the various versions
of TD-Gammon, a program that learns to play backgammon (Tesauro, 1995). After
playing several hundred thousand backgammon games in which rewards related to
whether or not the game is won or lost are given, TD-gammon learns to play at or near
world-championship level. Another example of reinforcement learning applied to a
practical problem is a program for cell phone routing (Singh & Bertsekas, 1997).
II. The Programming, Teaching, Learning (PTL) Model
Although machine learning methods are important for adapting robot control programs to
their environments, they by themselves are probably not sufficient for synthesis of
effective programs from a blank slate. I believe that efforts by human programmers at
various stages of the process will continue to be important---initially to produce a
2
3. preliminary program and later to improve or correct programs already modified by some
amount of learning. (Some of my ideas along these lines have been stimulated by
discussions with Sebastian Thrun.)
The programming part of what I call the PTL model involves a human programmer
attempting to program the robot to perform its suite of tasks. The teaching part involves
another human, a teacher, who shows the robot what is required (perhaps by “driving” it
through various tasks). This showing produces a training set, which can then be used by
supervised learning methods to clone the behavior of the teacher. The learning part
shapes behavior during on-the-job reinforcement learning, guided by rewards given by a
human user, a human teacher, and/or by the environment itself. Although not dealt with
in this article, a complete system will also need, at the strategic level, a planning part to
create mid-level programs for achieving user-specified goals. [A system called TRAIL
was able to learn the preconditions and effects of low-level robot actions. It then used
these learned descriptions in a STRIPS-like automatic planning system to create mid-
level robot control programs (Benson, 1996).]
I envision that the four methods, programming, teaching, learning, and planning might be
interspersed in arbitrary orders. It will be important therefore for the language(s) in
which programs are constructed and modified to be languages in which programs are
easy for humans to write and understand and ones that are compatible with machine
learning and planning methods. I believe these requirements rule out, for example, C
code and neural networks, however useful they might be in other applications.
III. Perceptual Imperfections
Robot learning must cope with various perceptual imperfections. Before moving on to
discuss learning methods themselves, I first describe some perceptual difficulties.
Effective robot control at the teleo-reactive level requires perceptual processing of sensor
data in order to determine the state of the environment. Suppose, in so far as a given set
of specific robot tasks is concerned, the robot’s world can be in any one of a set of states
{Si}. Suppose the robot’s perceptual apparatus transforms a world state, S, through a
mapping, P, to an input vector, x. That is, so far as the robot is concerned, its knowledge
of its world is given entirely by a vector of features, x = (x1, x2, . . ., xn). (I sometimes
abbreviate and call x the agent input even though the actual input is first processed by P.)
Two kinds of imperfections in the perceptual mapping, P, concern us. Because of random
noise, P might be a one-to-many mapping, in which case a given world state might at
different times be transformed into different input vectors. Or, because of inadequate
sensory apparatus, P might be a many-to-one mapping, in which case several different
world states might be transformed into the same input vector. This latter imperfection is
called perceptual aliasing.
(One way to mitigate against perceptual aliasing is to keep a record in memory of a string
of preceding input vectors; often, different world states are entered via different state
sequences, and these different sequences may give rise to different perceptual histories.)
3
4. We can distinguish six interesting cases in which noise and perceptual aliasing influence
the relationship between the action desired in a given world state and the actual action
taken by an agent in the state it perceives. I describe these cases with the help of some
diagrams.
Case 1 (no noise; no perceptual aliasing):
desired world state agent input actual
action action
a S1 x1 a
b S2 x2 b
a S3 x3 a
c S4 x4 c
perception
… …
Here, each world state is faithfully represented by a distinct input vector so that the actual
actions to be associated with inputs can match the desired actions. This is the ideal case.
Note that different world states can have the same desired actions. (Taken in different
world states, the same action may achieve different effects.)
Case 2 (minor noise; no perceptual aliasing):
desired world state agent input actual
action action
a S1 x1a a
b S2 x1b a
… x2a b
perception
x2b b
…
Here, each state is nominally perceived as a distinct input (represented by the dark arrows
in the diagram), but noise sometimes causes the state to be perceived as an input only
slightly different from the nominal one. We assume in this case that the noise is not so
great as to cause the agent to mistake one world state for another. For such minor noise,
the actual agent action can be the same as the desired action.
4
5. Cases 3 and 4 (perceptual aliasing; no noise):
desired world state agent input actual
action action
a S1 x1 a
b S2
…
a S3
perception
…
In this example, perceptual aliasing conflates three different world states to produce the
same agent input. In case 3, S1 and S2 have different desired actions, but since the agent
cannot make this distinction it will sometimes execute an inappropriate action. In case 4,
although S1 and S3 are conflated, the same action is called for, which is the action the
agent correctly executes.
Cases 5 and 6 (major noise occasionally simulates perceptual aliasing):
agent input
actual
desired action
world state
action
x1a a
a S1 x1b a
b S2 x2a b
a S3 x2b b
x3a a
…
x3b a
perception
…
Here, although each state is nominally differentiated by the agent’s perceptual system
(the dark arrows), major noise sometimes causes one world state to be mis-recognized as
another. Just as in the case of perceptual aliasing, there are two different outcomes: in one
(case 5), mis-recognition of S1 as S2 evokes an inappropriate action, and in the other
(case 6), mis-recognition of S1 as S3 leads to the correct action. Unlike case 3, however,
5
6. if mis-recognition is infrequent, case 5 will occur only occasionally, which might be
tolerable.
In a dynamic world in which the agent takes a sequence of sensor readings, several
adjacent ones can be averaged to reduce the effects of noise. Some of the case 5 mis-
recognitions might then be eliminated but at the expense of reduced perceptual acuity.
We will see examples of the difficulties these various imperfections cause in some
learning experiments to be described later.
IV. Teleo-Reactive (T-R) Programs
A. The T-R Formalism
A teleo-reactive (T-R) program is an agent control program that robustly directs the agent
toward a goal in a manner that continuously takes into account changing perceptions of
the environment. T-R programs were introduced in two papers by Nilsson (Nilsson 1992,
Nilsson 1994). In its simplest form, a T-R program consists of an ordered list of
production rules:
K1 a 1
...
Ki ai
...
Km a m
The Ki are conditions on perceptual inputs (and possibly also on a stored model of the
world), and the ai are actions on the world (or that change the model). In typical usage,
the condition K1 is a goal condition, which is what the program is designed to achieve,
and the action a1 is the null action.
A T-R program is interpreted in a manner roughly similar to the way in which ordered
production systems are interpreted: the list of rules is scanned from the top for the first
rule whose condition part is satisfied, and the corresponding action is then executed. A
T-R program is usually designed so that for each rule Ki ai, Ki is the regression,
through action ai, of some particular condition higher in the list. That is, Ki is the weakest
condition such that the execution of action ai under ordinary circumstances will achieve
some particular condition, say Kj, higher in the list (that is, with j < i ). T-R programs
designed in this way are said to have the regression property.
We assume that the set of conditions Ki covers most of the situations that might arise in
the course of achieving the goal K1. (Note that we do not require that the program be a
universal plan, i.e. one covering all possible situations.) If an action fails, due to an
execution error, noise, or the interference of some outside agent, the program will
nevertheless typically continue working toward the goal in an efficient way. This
robustness of execution is one of the advantages of T-R programs.
6
7. T-R programs differ substantively from conventional production systems, however, in
that actions in T-R programs can be durative rather than discrete. A durative action is one
that can continue indefinitely. For example, a mobile robot might be capable of executing
the durative action move, which propels the robot ahead (say at constant speed). Such an
action contrasts with a discrete one, such as move forward one meter. In a T-R program,
a durative action continues only so long as its corresponding condition remains the
highest true condition in the list. When the highest true condition changes, the current
executing action immediately changes correspondingly. Thus, unlike ordinary
production systems, the conditions must be continuously evaluated; the action associated
with the currently highest true condition is always the one being executed. An action
terminates when its associated condition ceases to be the highest true condition.
The regression condition for T-R programs must therefore be rephrased for durative
actions: For each rule Ki ai, Ki is the weakest condition such that continuous execution
of the action ai (under ordinary circumstances) eventually achieves some particular
condition, say Kj, with j < i . (The fact that Ki is the weakest such condition implies that,
under ordinary circumstances, it remains true until Kj is achieved.)
In a general T-R program, the conditions Ki may have free variables that are bound when
the T-R program is called to achieve a particular ground instance of K1. These bindings
are then applied to all the free variables in the other conditions and actions in the
program. Actions in a T-R program may be primitive, they may be sets of actions
executed simultaneously, or they may themselves be T-R programs. Thus, recursive T-R
programs are possible. (See Nilsson, 1992 for examples.)
When an action in a T-R program is itself a T-R program, it is important to emphasize
that the usual computer science control structure does not apply. The conditions of all of
the nested T-R programs in the hierarchy are always continuously being evaluated! The
action associated with the highest true condition in the highest program in the stack of
“called” programs is the one that is evoked. Thus, any program can always regain
control from any of those that it causes to be called---essentially interrupting any durative
action in progress. This responsiveness to the current perceived state of the environment
is another one of the advantages of T-R programs.
Sometimes it is useful to represent a T-R program as a tree, called a T-R tree, as shown
below:
K1
K2 .. K3
.. ..
..
7
8. Suppose two rules in a T-R program are Ki ai and Kj aj with j < i and with Ki the
regression of Kj through action ai. Then we have nodes in the T-R tree corresponding to
Ki and Kj and an arc labeled by ai directed from Ki to Kj. That is, when Ki is the
shallowest true node in the tree, execution of its corresponding action, ai, should achieve
Kj. The root node is labeled with the goal condition and is called the goal node. When
two or more nodes have the same parent, there are correspondingly two or more ways in
which to achieve the parent's condition.
Continuous execution of a T-R tree would be achieved by a continuous computation of
the shallowest true node and execution of its corresponding action. (Ties among equally
shallow True nodes can be broken by some arbitrary but fixed tie-breaking rule.) We call
the shallowest true node in a T-R tree the active node.
The “backward-from-the-goal” approach to writing T-R programs makes them relatively
easy to write and understand, as experience has shown.
B. T-R programs and Decision Lists
Decision lists are a class of Boolean functions described by Rivest (Rivest, 1987). In
particular, the class k-DL(n) consists of those functions that can be written in the form:
K1 v1
...
Ki vi
...
Km vm
where:
1) each Ki (for i =1, . . ., m-1) is a Boolean term over n variables consisting of at most k
literals, and Km = T (having value True). (A term is a conjunction of literals, and a literal
is a Boolean variable or its complement, having value True or False.)
and
2) each vi is either True or False.
The value of a k-DL(n) function represented in this fashion is that vi corresponding to the
first Ki in the list having value True. Note that if none of the Ki up to and including Km-1
has value True, the function itself will have value vm.
T-R programs over n variables whose conditions are Boolean terms having at most k
literals are thus a generalization of the class k-DL(n), a generalization in which the vi may
have q > 2 different values. Let us use the notation k-TR(n,q) to represent this class of T-
R programs. Note that k-TR(n,2) = k-DL(n).
8
9. V. Learnability of T-R Programs
Since it appears that T-R programs are not difficult for humans to write and understand, I
now come to the topic of machine-learning of T-R programs. First I want to make some
remarks stemming from the fact that T-R programs whose conditions operate on binary
inputs are multi-output generalizations of decision lists. Rivest has shown that the class k-
DL(n) of decision lists is polynomially PAC learnable (Rivest, 1987). To do so, it is
sufficient to prove that:
1) the size of the class of k-DL(n) = O(2 n t ), where n is the dimensionality of the input
and t is some constant,
and,
2) one can identify in polynomial time a member of the class k-DL(n) that is consistent
with the training set.
The first requirement was shown to be satisfied using a simple, worst-case counting
argument, and the second was shown by construction using a greedy algorithm.
It is straightforward to show by analogous arguments that both requirements are also met
by the class k-TR(n, q). Therefore, this class is also polynomially PAC learnable.
Even though much experimental evidence suggests that PAC learnability of a class of
functions is not necessarily predictive of whether or not that class can be usefully and
practically learned, the fact that this subclass of T-R programs is polynomially PAC
learnable is a point in their favor.
VI. The Squish Algorithm for Supervisory Learning of T-R Programs
George John (John, 1994) proposed an algorithm he called Squish for learning a T-R
program to mimic the performance of a teacher (behavioral cloning). Some limited
experimental testing of this algorithm has been performed---some using simulated robots
and some using a physical robot. These experiments will be described shortly.
Squish works as follows. An agent is “steered” by a teacher in the performance of some
task. By steering, I mean that the teacher, observing the agent and the agent’s
environment, controls the agent’s actions until it achieves the goal (or one of the goals)
defined by the task. Squish collects the perceptual/action history of this experience. To
do so, the string of perceptual input vectors is sampled (at some rate appropriate to the
task), and the action selected by the teacher at each sample point is noted. Several such
histories are collected.
The result of this stage will be a collection of strings such as the following:
9
10. x11a11x12a12x13a13 . . . x1na1nxGn
...
xi1ai1xi2ai2xi3ai3 . . . ximaimxGi
...
Each xij is a vector of inputs (obtained by perceptual processing by the agent), and each
akl is the action selected by the teacher for the input vector preceding that action in the
string. The vectors xGi are inputs that satisfy the goal condition for the task.
Note that each such string can be thought of as a T-R program of the form:
{xGi } Nil
{xim } aim
...
{xi1 } ai1
where the singleton sets {xim } represent conditions satisfied only if the input vector is, in
fact, a member of the set.
Since T-R programs can take the form of trees, we can combine all of the learning
sequences into a T-R tree as shown below:
{xG1 . . . xGi . . .}
a1n aim
{x1n} {xim}
... ... ... ...
{x12} {xi2}
a11 ai1
{x11} {xi1}
10
11. Of course the program represented by such a tree could evoke actions only for those
exact inputs that occurred during the teaching process. That limitation (as well as the
potentially large size of the tree) motivates the remaining stages of the Squish algorithm.
First, we collapse (squish) chains of identical actions. (For these, obviously, the same
action endured through multiple samplings of the input.) We illustrate this process by the
diagram below:
ak ak
S S
ai ai
... Si ∪ Sj ...
... Si ...
aj
ai
Sj
aj
Next, beginning with the top node and proceeding recursively, we look for any
immediate successors of a node that evoke the same action. These siblings are combined
into a single node labeled by the union of the sets labeling the siblings. We illustrate this
process by the diagram below:
S S
ai ai ai
... Si ∪ Sj ...
... Si ... Sj ...
11
12. Finally, no more collapsing of these sorts can be done, and we are left with a tree whose
nodes are labeled by sets of input vectors and whose arcs are labeled by actions.
Still, the conditions at the nodes are satisfied only by the members of the sets labeling
those nodes; there is no generalization to similar inputs. To deal with this defect, we use
machine learning methods to replace each set by a more general condition that is satisfied
by all (or most) members of the set. (Perhaps it would be appropriate to relax “all” to
“most” in the presence of noise.)
There are at least three ways in which this generalization might be accomplished. In the
first, a connected region of multi-dimensional space slightly bigger, say, (or perhaps
smaller in the case of noise) than the convex hull of the members of the set is defined by
bounding surfaces---perhaps hyperplanes parallel to the coordinate axes. If such
hyperplanes are used, the condition of being in the region can be given by a conjunction
of expressions defining intervals on the input components. Such a condition would
presumably be easy to understand by a human programmer inspecting the result of the
learning process. A two-dimensional example might be illuminating. Suppose the inputs
in a certain set are:
(3,5), (3,6), (5,7), and (4,4)
Each input lies within the box illustrated below:
x2
x1
The conditions associated with this set of four inputs would be:
2 ≤ x1 ≤ 6, and
3 ≤ x2 ≤ 8
12
13. In this manner a T-R program consisting of interval-based conditions with their
associated actions is the final output of the teacher-guided learning process.
In another method of generalizing the condition at a node, the inputs labeling a node of
the tree are identified as positive instances, and the inputs at all those nodes not ancestral
to that node are labeled as negative instances. Then, one of a variety of machine learning
methods can be used to build a classifier that discriminates between positive and negative
instances for each node in the T-R tree. If the conditions are to be easily understood by
human programmers, one might learn a decision tree whose nodes are intervals on the
various input parameters. The condition implemented by a decision tree can readily be
put in the form of a conjunction of interval tests. Alternatively, one could use a neural-
net-based classifier. (John’s original suggestion was to use a maximum-likelihood
classifier.)
Another method for generalization uses a “nearest-neighbor” calculation. First, in each
node any repeated vectors are eliminated. A new input vector triggers that node having a
vector that is closest to the new input vector (in a squared-difference sense)---giving
preference to nodes higher in the tree in case of a tie.
VII. Experiments with Squish
A. Experiments with Simulated Robots
1. The task and experimental set-up
John used Squish (with a maximum-likelihood classifier) to have a robot learn how to
grab an object in a simulated two-dimensional world called Botworld (Benson, 1993). In
this simulated world, there was no perceptual aliasing.
The Botworld environment has had several instantiations. In John's experiments,
Botworld appeared as in the screen shot below:
13
14. The round objects are simulated robots, called “bots,” which can move forward, turn, and
grab and hold a “bar” with their “arms” as shown in the figure. Using the buttons on the
graphical interface, a teacher could drive the bot during training sessions.
For the learning experiments to be described, John endowed the bot with the following
perceptual predicates:
Grabbing: Has value True if and only if the bot is holding the bar)
At-bar: Has value True if and only if the bot is in the right position to grab a bar (it must
be at just the right distance from the bar)
Facing-bar: Has value True if and only if the bot is facing the bar)
On-midline: Has value True if and only if the bot is on the imaginary line that is the
perpendicular bisector of the bar)
Facing-midline: Has value True if and only if the bot is facing a certain "preparatory
area" segment of the midline)
Because a bot's heading and position were represented by real numbers, all of these
predicates (except Grabbing) involved tolerance intervals.
The bot had two durative actions, namely turn and move and one “ballistic” action,
namely grab-bar. A T-R tree for bar grabbing using these actions and these perceptual
predicates is shown below:
Grabbing
grab-bar
At-bar ^ Facing-bar
move
Facing-bar
turn
On-midline
move
Facing-midline
turn
T
14
15. 2. Learning experiments
The bot was “driven” to grab a bar a few times in order to generate a training set. The
input vectors were composed of the values of the five perceptual predicates as the bot
was driven. Squish was used to generate a T-R tree, and a maximum-likelihood classifier
was established at each node. The vectors at a node were the positive instances for that
node, and all of the vectors at nodes lower in the tree were the negative instances for that
node.
According to John (unpublished private communication): “. . . it did work most of the
time, meaning that afterwards it could drive itself (to grab the bar), but this was only if I
drove the bot using my knowledge of which features it (the lisp code) could observe, and
only if I was pretty careful to drive it well. It would break if the driver wasn't very good.
This is a common problem in programming by demonstration---how to get the driver or
demonstrator to understand the features that the learning algorithm can observe, so that
the instruction can be productive.”
B. Experiments with a Nomad Robot
1. The task and experimental set-up
In the next set of experiments, Thomas Willeke (Willeke, 1998) wrote a T-R program for
a real robot to enable it to perform the simple task of “corner-finding.” The task
involved moving forward perpendicular to one of the walls of a rectangular enclosure,
turning 90 degrees to the right whenever the robot’s motion was impeded by a wall or an
obstacle. The robot continued these actions until it sensed that it was in one of the
corners of its enclosure.
The robot used was a Nomad 150 from Nomadic Technologies, Inc. (See
http://www.robots.com/n150.htm for full technical details.) The Nomad 150 is a wheeled
cylindrical base whose only external sensors are sixteen sonar transceivers evenly
positioned around its circumference. Thus, the input vector, x, is a 16-dimensional vector
whose components are sonar-measured distances to objects and walls. These vectors
have different typical forms that can be used to distinguish situations such as: I am in
(relatively) free space, there is a wall or obstacle in front of me, there is a wall on my left
(or right) side, and I am in a corner. The experimental set-up and desired behaviors are
shown below:
2 1
15
16. When starting from position 1, for example, the robot moves forward until it comes close
to a wall. It then turns to the right and proceeds to a corner. When starting in position 2,
it moves until it comes close to the object, turns right, proceeds to the other object, turns
right, proceeds to the wall, turns right and proceeds to a corner.
It may be argued that, since the desired action (turn right) when blocked by an obstacle is
the same as the desired action when blocked by a wall, these two states (although
different in the world) are the same so far as the robot is concerned. For some classes of
control programs it does no harm to conflate these states (as the Nomad robot does). But
since, in general, we would like T-R programs to have the regression property, these
states ought to be treated differently since the actions taken in them have different effects.
Corner-finding T-R programs are not difficult to write. Sonar noise can create some
problems, however. In Willeke’s words: “There comes a point in the turn when, due to
sonars bouncing at steep angles off the wall, the perceptual input is exactly similar to that
of the robot following along a wall. (See case 5 mentioned above.) So the T-R program
changes state and the robot tends to leap forward (thinking that it has already completed
the turn), slamming itself into the wall.” Willeke’s solution to this problem: “. . . we
simply hand wrote in code so that the turns became ballistic actions. So, after
determining that a turn was in order, the robot simply turned 90 degrees.”
As mentioned previously, another way of dealing with sonar (or any) noise is to do some
averaging. As Willeke suggests: “One simple solution to the turning bug would be to
simply require that three sonar readings in a row be different before switching states.
This, of course, makes the robot less reactive to sudden changes in the environment.
And, of course, if the robot still ends up in the wrong state it will be even slower to
recover. We did implement this idea and it clearly helped. The robot made turns that
were much closer to the 90 degrees we wanted. It no longer slammed into the wall part
way through the turn. But it still had an exit condition problem. Because the end of a turn
and normal forward motion look very much the same, the robot didn't usually make it
exactly 90 degrees around before restarting forward motion. This caused it to drift into
the wall later, or drift off away from the wall if it over-turned.”
2. Learning experiments
The training data was collected by driving the robot around with a joystick and recording
the sensor values and joystick commands. Different runs were performed with different
starting positions and configurations of obstacles but always ending in a corner. Willeke
used the Squish algorithm to collapse the data, and in one set of experiments he trained a
simple threshold logic unit (TLU) at each node of the T-R tree to generalize the input
vectors at that node.
In training the TLU, Willeke used a modification of the procedure in which the inputs
labeling a node of the tree are identified as positive instances and the inputs at all those
nodes not ancestral to that node are labeled as negative instances. The requirement for
the modification results from an instance of the case 4 situation mentioned above. The
16
17. robot sometimes makes more than one right turn in a single data run, and the situation
just before each turn produces identical (or very similar) input vectors. In these runs,
positive input vectors associated with one node in the tree will be identical (or very
similar) to negative input vectors at a lower node in the tree. The diagram below
illustrates this problem and our modification:
SG
a c
* {x1, x1, . .} S2
b a
S1 S3
a
{x1, x2, . .}
In learning a condition to use at the node marked *, we take the vectors in {x1, x1, . . .} as
the positive instances and those in S1 and S2 as the only negative instances. In particular,
we exclude the vectors in {x1, x2, . . .} and those in S3 from the set of negative instances
because the action, a, at those nodes is identical to the action at the node marked *.
In general then, the modification involves throwing out all input vectors at the non-
ancestral nodes that are associated with the same action as that at the node for which we
are learning a condition. (Willeke points out some possible difficulties that this
modification entails if it is used in conjunction with procedures that use memorized state
information in addition to sensory inputs, but these need not concern us here.)
In several experiments of this sort for learning a corner-finding T-R program, Willeke
states that the (modified) algorithm worked “surprising well.”
In another set of experiments, Willeke used a nearest-neighbor generalization scheme.
Since this method does not involve separating inputs into positive and negative classes,
no special consideration need be given to case 4 repeated action situations. These
experiments were as successful as those in which TLUs were trained at each node. The
robot was able to learn corner-finding behaviors from complex training runs that repeated
17
18. perceptual states. The method does require remembering all of the inputs accumulated
during training and extensive computation at run time, but recent work of A. J. Moore
(Moore, 2000) appears to enable nearest-neighbor methods to scale well to large
problems.
The “interval-box” method for generalizing the nodal conditions was not tried, but, just
as in the nearest-neighbor method, it would not have required separating inputs into
positive and negative classes.
The success of these preliminary experiments suggests that it would be reasonable to try
these methods on more complex tasks performed by more complex robots.
VIII. Proposals for Reinforcement Learning of T-R Programs
A. Neural Network Q-Learning
In reinforcement learning, “reward” amounts are given by a teacher (or by the
environment itself) when a robot takes actions and thereby enters certain states. The
rewards can be positive or negative. The rewards are used to change the “action policy”
of the robot. One seeks a training method that results in an action policy that maximizes
some function of the future expected reward. (Sutton & Barto, 1998) is a text on
reinforcement learning.)
In a version of reinforcement learning called Q-learning, first proposed by Watkins
(Watkins, 1989), the action policy is based on a function over perceived states and
actions, Q(x,a), where x is the perceived state (say an input vector), and a is a robot
action. For any state x, the robot takes that action a which maximizes Q(x,a) over all
possible actions.
In reinforcement learning, one seeks to learn an optimal policy by making adjustments to
a trial policy in response to rewards for actions taken. Learning a policy can be
ˆ
accomplished by learning an estimate, Q , of the optimal Q function.
Neural network methods have been proposed for implementing a policy and for learning
a Q function (Lin, 1992, Tesauro 1995). Consider the neural network shown below:
ˆ
Q(x, a1 )
x W ˆ
ˆ
a = arg max i Q(x, ai )
Q ( x , ai )
ˆ
Q(x, a ) m
The network computes estimates of the Q function for each possible action and evokes
that action, a, corresponding to the largest of these estimates. Reinforcement learning of
1
18
19. Qvalues uses a temporal difference method, such as TD(0) (Sutton, 1988), for changing
the estimates. Suppose action a is taken (by a partially trained network) in response to
ˆ ˆ
input vector x, and that the corresponding Qvalue estimate is Q(x, a ) . (That is, Q(x, a)
ˆ
= max i Q ( x, ai ) .) Suppose that the durative execution of this action ultimately leads to
an input vector y which evokes some other action, b. TD learning assumes that the
ˆ ˆ
quantity r + γQ( y, b) is a more accurate estimate of Q(x, a) than is Q(x, a) and updates
ˆ ˆ
Q(x, a) to make it more closely equal r + γQ( y, b) .
0< γ <1 is the “temporal discount factor,” and r is the immediate reward for executing a
ˆ
in that circumstance. Just that single Q(x, ai ) which is the largest of the estimates is
ˆ
updated, and all others are left unchanged. The updating formula for that Q(x, a ) is:
i
ˆ ˆ ˆ ˆ
Q(x, ai ) ← Q(x, a ) + β ([r + γQ( y, b)] − Q( x, a ))
ˆ ˆ
(That is, we move “β of the way” from Q(x, a) to r + γQ( y, b) , where 0 < β < 1.)
Several researchers have used the standard backpropagation algorithm (Rumelhart et al.,
ˆ
1986) to effect these changes in the Q function implemented by neural networks.
B. Representing a T-R Program by a Neural Network
In order to use neural network Q-learning methods for learning T-R programs, we first
have to be able to represent a T-R program by a neural network. There are several ways
in which this might be done. We illustrate one method by the network below:
19
20. This network implements a T-R program whose m conditions (in order) are K1, K2, . . .,
Ki, . . ., Km. Assuming that the conditions are conjunctions of components of the input
vector, x, these can all be implemented by the first layer of TLUs, as shown.
Corresponding to each condition unit in the first layer is an AND unit in the second layer.
These can also be implemented by TLUs. Each AND unit is wired up with appropriate
inhibitory connections from the condition units so that the AND unit responds if and only
if its associated condition unit (in the first layer) is the “highest” condition unit
responding. That is, one and only one AND unit responds, and the one that does respond
corresponds to the highest true condition among the Ki.
Now, we have only to associate the appropriate action with the highest true condition.
This is done by a layer of OR associator units implemented, again, by TLUs. We have
one such unit for each of the (let us say) k actions, a1, a2, . . ., ai, . . ., ak. An AND unit is
wired up to an associator unit if and only if that associator unit corresponds to an action
that is called for by the AND unit. Thus, an AND unit can be wired up to only one
associator unit, but each associator unit might be wired up to more than one AND unit.
Thus, we have shown that a specific three-layer, feed-forward neural network can
implement any T-R program. I will call such a neural network a T-R net.
C. Training a T-R Net by Q-learning
In order to use back-propagation to train a T-R net, we replace the TLUs by differentiable
functions as usual. We replace the condition units and the AND units by sigmoids, and
we replace the associator units by a simple summing device. We show the resulting
network below:
trainable
weights
fixed
weights
Σ
Σ
x
Σ Q(x,ai)
Σ
m condition m “AND”
sigmoids sigmoids k associators
20
21. The outputs of the summation unit associators are taken to be the Q values, Q(x,ai). In
the case in which TLUs are used instead of sigmoids for the condition and AND units,
one and only one of these Q values would have value 1; the others would have value 0.
Selecting the action corresponding to the highest Q value would give us the appropriate
action. We can regard the new network as implementing a “softened” version of this
decision process---one that is amenable to backpropagation training.
In training the network only the weights in the first layer and the third layer are modified
during the process of updating Q values. The weights in the second layer are left fixed.
(These are the weights that implement the rule that the highest true condition evokes an
action.) The training process thus creates the appropriate conditions and associates them
with appropriate actions. Again, training can be accomplished by a standard
backpropagation rule, constrained to leave the second-layer weights fixed. The
conditions used by the resulting T-R program will now be linearly separable functions of
the input components rather than simple conjunctions.
To date, no experiments have been conducted to test this technique. We note that even if
training such a network succeeds, we are not guaranteed that the corresponding T-R
program will satisfy the regression property. Perhaps some modification of the training
process could be found that would achieve that restriction.
D. Learning a T-R Program by Reinforcement Learning of Prototype Points
I conducted some preliminary experiments at the Santa Fe Institute in 1991 on learning
T-R programs using a method that combined cluster seeking with reinforcement learning.
The experiments were with a simulated robot in “botworld,” and the learning program
was written in Lisp. (I still have the Lisp program, dated February 22, 1991.) The
experiments were modestly successful and are described here with the hope that the
method can serve as a starting point for further work.
The method learns regions and associated actions in the space of input vectors, x. Before
discussing how the regions and actions are learned, I describe how they can be
interpreted as T-R programs.
1. Region graphs
We define a set of regions in the space of input vectors by a set of prototype points P =
{P1, P2, … , Pi, … ,Pm}. The prototype points can be imagined as being at the centers of
clusters of input vectors. The prototype points induce a set of m regions R; Each region,
Ri, contains all those input vectors that are closer to Pi than to any other Pj , j ≠ i.
We suppose that the Pi can be chosen in such a way that:
a) if an action can be executed at one point in a region, it can be executed at any point in
that region, and
21
22. b) for every pair Ri and Rj, if continued execution of an action, a, at one point in Ri results
in the input vector next moving to Rj , then for all points in Ri, continued execution of a
results in the input vector next moving to Rj.
Thus, certain pairs of regions, Ri and Rj, are linked by actions, and we can define a
directed graph whose nodes correspond to the regions (or, equivalently, to the prototype
points defining the regions) and whose arcs correspond to actions. We shall call such a
graph a region graph. Typically, only regions that are adjacent in the input space will
have an action arc connecting them, although adjacency is neither necessary nor
sufficient for linkage in the graph. Note also that the same action may label several
different arcs in the graph and several arcs may emanate and converge on a single region
node.
We use the region graph to describe the nominal preconditions and effects of the actions
of the robot. An action, ai, can be executed by the robot if and only if it labels an arc
whose tail emanates from a region containing the current input vector. The result of
continued execution of such an action is that the input vector traverses the region at the
tail of the arc until it enters the region at the head of the arc. Thence, the control system
of the robot must select an action corresponding to one of the outgoing arcs from that
new region.
In the figure below, we show a two-dimensional example to illustrate these ideas.
There are five prototype points and corresponding regions. We illustrate the actions by
arrows directed from one region to another. For example, action a1 can be used either to
go from R4 to R3 or from R2 to R1. Suppose the robot's task is to have the input vector in
R3. The heavy action arrows constitute a spanning tree of the region graph for this
problem; regardless of the region of the initial input vector, executing an action
corresponding to an outgoing heavy arrow will result ultimately in the input vector
landing in R3.
22
23. In terms of the region graph then, a robot's control strategy to achieve a given goal can be
implemented by selecting that prototype point that is closest to the input vector and
executing the action that labels the corresponding outgoing arc of the spanning tree for
that goal. The spanning tree can be thought of as a T-R program: the conditions are
membership in the regions, and the actions are the arcs of the spanning tree.
An alternative way of implementing the same strategy is to assign each region a value
corresponding to its distance (counting action arcs) from the goal region. Then in any
region, that action is selected that will next take the input vector to the accessible region
having the smallest value. If the values of the regions are negated (so that larger values
are preferred instead of smaller ones), then this strategy implements a version of a
“policy function” for achieving maximum reward, suggesting that Q-learning techniques
might be capable of learning the prototype points.
2. Reinforcement learning of prototype points and actions
Our learning algorithm starts with a set of prototype points, P = {P1, P2, … , Pi, … ,Pm},
that are initially set to random values. Each prototype point is arbitrarily assigned one of
the actions in the set of possible actions. To account for the possibility that the same
action might be executed in different circumstances, multiple prototype points might be
assigned the same action. As the robot executes actions during learning trials to achieve
a particular goal, these points (and their corresponding regions) individually migrate
through the input space (according to rules we shall describe) until they stabilize at
positions defining an acceptable spanning tree of the region graph. Except when actions
are selected by a teacher, soon to be discussed, that action is executed that is associated
with the prototype point that is closest to the input vector.
Each prototype point is also assigned a scalar mass and a scalar rank, each initially set to
zero, that are changed during learning and are used by the procedure that migrates the
prototype points. The mass is supposed to represent how many times the action
associated with its prototype point has been “successful,” and the rank is supposed to
represent something similar to the “Q-value” of its corresponding region-action pair in
terms of its proximity to the goal. In accord with the temporal-difference learning
literature, the larger the rank or value of a region the closer it is to the goal. In the
version of the algorithm described here, we assume we know a particular region of the
input space that corresponds to achieving the goal. This region is assigned a large and
unchangeable rank, say 100.
A learning run terminates whenever the robot enters the goal region. Learning can then
continue by re-positioning the robot and beginning another run---using the prototypes,
ranks, and masses learned in previous runs. Whenever the robot is not in the goal region
and the closest prototype point has zero rank, as it will initially, the robot attempts to
execute one of its actions, chosen randomly, for a random amount of time. Otherwise,
the action corresponding to the prototype point that is closest to the input vector is
executed. Whenever the rank of the prototype point of an executing action is above
zero, the rank decays to zero at some constant rate during the execution of that action.
23
24. Learning takes place as follows: Suppose for some input, x, not in the goal region, the
closest prototype point and its corresponding action is denoted by the pair (P,a).
Suppose that continued execution of a results in a continuous stream of inputs for which
P is still the closest prototype point, but that ultimately for some x' either the closest
prototype point becomes P' or x' is in the goal region.
At that time, if P' is of higher rank than P (or if x' is in the goal region):
1. The rank of P is increased to an amount between its old rank (at that time) and the
rank of P' (or of the goal region if x' is in the goal region), say to the average of the
before and after ranks.
2. The mass of P is increased by 1.
3. P is changed to:
P (m P + xavg)/(m+1)
where m is the (old) mass of P, and xavg is the average value of the input vector during the
time a was being executed until the input vector became closer to P' than to P (or
entered the goal region).
If P' is of lower rank than P:
1. The rank of P is decreased to an amount between its old rank (at that time) and the
rank of P', say to the average of the two.
2. The mass of P is left unchanged.
3. P is changed to:
P (m P - xavg)/(m+1)
In all cases, whenever the rank of a prototype point falls to zero (or, alternatively, below
some low threshold value), the mass of that region is reset to zero.
The algorithm is a temporal-difference procedure (Sutton, 1988) because the rank of a
prototype point is changed to make it closer to the rank of a temporally next region. It is
a delayed reinforcement learning procedure because reward comes only when the input
vector enters the goal region. We could make the algorithm more closely resemble
classical reinforcement learning procedures by selecting actions according to a
probability distribution that favors that action associated with the closest prototype point.
24
25. The algorithm is also a cluster-seeking procedure because the prototype points will tend
to end up in the centers of clusters of frequently occurring input vectors for which the
same action was successful in moving the input vector toward the goal. It is this latter
property which gives the resulting action selection policy a capability to generalize over
similar inputs (for which the same action ought to be selected). Experiments with a
similar cluster-seeking, reinforcement procedure were conducted by (Mahadevan &
Connell, 1992).
3. Preliminary experiments
Some preliminary experiments with this algorithm were conducted in 1991 in the
botworld domain. The goal was to have a bot grab a bar. All reinforcement learning
procedures work best when the task to be learned is learned “backwards” from large
rewards. For our migrating prototype-point algorithm, backwards learning involves
placing the robot near the goal or near prototype points already having been assigned
high rank (such that random actions rather quickly get it close to the goal or to those
prototype points). The high values of the prototype points previously learned are thus
propagated backward to points associated with actions that move the robot toward these
points learned earlier. With these facts in mind, I used a “teacher” to force backwards
learning of a solution and then checked to see whether or not continued learning
destroyed the solution. Solutions seemed to be stable; once achieved, further learning did
not destroy them. It remains to be seen under what circumstances the algorithm can be
used in a less guided fashion to obtain a solution in the first place.
There are some parameters, such as the rate of decay of a prototype point's rank, the
increments by which the ranks and masses are changed, and the amount by which a
prototype point is moved, whose adjustments would undoubtedly affect the performance
of the algorithm.
The botworld domain in which the learning experiments were performed and the
components of the input vector, x, are illustrated and defined in the figure below. The
reader will note that I chose the components of the input vector, x = (x1, x2, x3, x4, x5, x6,
x7, x8, x9), to be those that I knew were particularly relevant to performing the task. A
more severe learning task would be one in which the input vector consisted of more
primitive components.
25
26. x1 = 1 iff bot is grabbing
bar
x2 = 1 iff bot’s center is within this
x3 = 1 iff bot’s heading is normal to box (same as At-bar defined
bar, within some tolerance (same as earlier)
Facing-bar defined earlier)
x5 = 1 iff bot’s center is within this
x4 = 1 iff bot’s center is within this box
strip (same as On-midline defined
earlier) x6 = ¬x4 ^ ccwg
ccwg = 1 iff bot should turn ccw to x7 = ¬x4 ^ cwg
face this box
x8 = x4 ^ ccwb
cwg = 1 iff bot should turn cw to
face this box x9 = x4 ^ cwb
ccwb = 1 iff bot should turn ccw to
face in same direction as the normal
to the bar
cwb = 1 iff bot should turn cw to face
in same direction as the normal to the
bar
The goal region consists of all of those inputs for which x1 = 1. The usual domain
“physics” applies; the robot can grab the bar only if x2 = 1 and x3 = 1.
I used seven prototype points. These and their associated actions were:
B, move
C, turn ccw
D, turn cw
BB, move
CC, turn ccw
DD, turn cw
E, grab
The initial value of each point was the vector (1/2, 1/2, 1/2, 1/2, 1/2, 1/2, 1/2, 1/2, 1/2).
The initial ranks and masses were all 0.
A teacher can insert himself into the learning process by selecting a prototype point and
its action (instead of letting the robot select it by the usual nearest-distance calculation).
After a teacher-selected action, the learning process already described is then allowed to
take place as usual. I used this teaching adjustment to the learning process in the
following way:
26
27. 1. The robot was first positioned so that x and x3 were both equal to 1, and I (the
2
teacher) selected point E. The robot performed the associated grab action (which, of
course succeeded), and the position, mass, and rank of E were changed accordingly. I
did this initial “priming” several times.
2. Then, the robot was positioned so that x and x5 were both equal to 1, and I selected
3
B. The robot performed the associated move action until the robot got closest to E,
and the position, mass, and rank of B were changed accordingly. Again, this step was
performed several times.
3. The robot was positioned so that x 5 was equal to 1, and I selected either C or D, as
appropriate, etc.
4. And so on.
After priming in this way, I ceased teaching and allowed the robot to continue on its own
for several more learning trials. The main result of this experiment was not that the robot
could learn to grab the bar on its own without teaching, but that once it had been primed,
continued learning was “stable” in the sense that the skill was not unlearned.
After priming and after several subsequent learning trials without the teacher, the
prototype points, their masses and ranks had the following values:
x1 x2 x3 x4 x5 x6 x7 x8 x9 mass rank
B 0 0.006 0.85 0.84 0.29 0 0.17 0.08 0.03 33 92
C 0 0 0.01 0.94 0.33 0 0.05 0.88 0.06 18 63
D 0 0 0.02 0.83 0.28 0 0.16 0 0.8 18 66
BB 0 0 0 0 0.125 0 0 0 0 2 47
CC 0 0 0.02 0 0 1 0 0 0 3 23
DD 0 0 0.02 0 0 0 1 0 0 3 34
E 0 0.97 0.82 0.71 0 0 0.29 0.03 0 35 99
By way of comparison, we might note that the following values of the prototype points
would yield ideal bar-grabbing behavior:
x1 x2 x3 x4 x5 x6 x7 x8 x9
B 0 0 1.0 1.0 0.5 0 0 0 0
C 0 0 0 1.0 0.5 0 0 1.0 0
D 0 0 0 1.0 0.5 0 0 0 1.0
BB 0 0 0 0 0 0 0 0 0
CC 0 0 0 0 0 1 0 0 0
DD 0 0 0 0 0 0 1 0 0
27
28. x1 x2 x3 x4 x5 x6 x7 x8 x9
E 0 1.0 1.0 1.0 0 0 0 0 0
These initial experiments suggest that the migrating prototype algorithm has potential for
the reinforcement learning of T-R programs. Further development and experimentation
thus seems justified.
IX. Conclusions
Versatile robots will need to be programmed, of course. But beyond explicit
programming by a programmer, they will need to be able to plan how to perform new
tasks and how to perform old tasks under new circumstances. They will also need to be
able to learn. In addition to learning about their environments by making maps, they will
need to learn under what conditions various actions have what effects. [See, for example,
(Benson, 1996).]
In this article, I concentrated on two other types of learning, namely supervised learning
and reinforcement learning of robot control programs. I argued also that it would be
useful for all of these programs, those explicitly programmed, those planned, and those
learned, to be expressed in a common language. I proposed what I think is a good
candidate for such a language, namely the formalism of teleo-reactive (T-R) programs.
Most of the article dealt with the matter of learning T-R programs. I asserted that such
programs are PAC learnable and then described some techniques for learning them and
the results of some preliminary learning experiments. The work on learning T-R
programs is in a very early stage, but I think enough has been started to warrant further
development and experimentation. For that reason I make this article available on the
web, but I caution readers about the tentative nature of this work. I solicit comments and
suggestions at: nilsson@cs.stanford.edu.
REFERENCES
Benson 1993
Benson, S., “Botworld Homepage,” Robotics Laboratory, Department of
Computer Science, Stanford University, 1993.
http://robotics.stanford.edu/~sbenson/botworld.html
Benson 1996
Benson, S., Learning Action Models for Reactive Autonomous Agents, PhD
Thesis, Department of Computer Science, Stanford University, 1996.
Connell 1992
28
29. Connell, J., “SSS: A Hybrid Architecture Applied to Robot Navigation,” in Proc
1992 IEEE International Conf. on Robotics and Automation, pp. 2719-2724, 1992.
John 1994
John, G., “SQUISH: A Preprocessing Method for Supervised Learning of T-R
Trees From Solution Paths,” Draft Memo, Stanford Computer Science Dept., November
22, 1994.
Lin 1992
Lin, L. J., “Self-Improving Reactive Agents Based On Reinforcement Learning,
Planning and Teaching,” Machine Learning 8:293-321, 1992.
Mahadevan & Connell 1992
Mahadevan, S., and Connell, J., “Automatic Programming of Behavior-Based
Robots Using Reinforcement Learning,” Proceedings ML92, pp. 290-299, 1992.
Michie, et al. 1990
Michie, D., Bain, M., and Hayes-Michie, J. E., “Cognitive Models from
Subcognitive Skills,” in M. Grimble, S. McGhee, and P. Mowforth (eds.), Knowledge-
base Systems in Industrial Control. Peter Peregrinus, 1990.
Moore 2000
Moore, A. W., “The Anchors Hierarchy: Using the Triangle Inequality to Survive
High Dimensional Data,” Proc. Twelfth Conference on Uncertainty in Artificial
Intelligence, Menlo Park, CA: AAAI Press, 2000.
Mahadevan & Connell 1992
Mahadevan, S., and Connell, J., “Automatic Programming of Behavior-Based
Robots Using Reinforcement Learning,” Proceedings ML92, pp. 290-299, 1992.
Nilsson 1984
Nilsson, N. J., Shakey the Robot, Technical Note 325, SRI International, Menlo
Park, CA, 1984.
Nilsson 1992
Nilsson, N. J., Toward Agent Programs with Circuit Semantics, Technical Report
STAN-CS-92-1412, Stanford University Computer Science Department, 1992.
Nilsson 1994
Nilsson, N. J., “Teleo-Reactive Programs for Agent Control,” Journal of
Artificial Intelligence Research, 1, pp. 139-158, January 1994.
Pomerleau 1993
Pomerleau, D., Neural Network Perception for Mobile Robot Guidance, Boston:
Kluwer Academic Publishers, 1993.
29
30. Rivest 1987
Rivest, R., “Learning Decision Lists,” Machine Learning, 2:229-246, 1987.
Rumelhart et al. 1986
Rumelhart, D., Hinton, G., and Willliams, R., “Learning Internal Representations
by Error Propagation,” in Rumelhart, D., and McClelland, J., (eds.), Parallel Distributed
Processing, Vol 1, pp. 318-362, Cambridge, MA: The MIT Press, 1986.
Sammut, et al. 1992
Sammut, C., Hurst, S., Kedzier, D., and Michie, D., “Learning to Fly,” in D.
Sleeman and P. Edwards (eds.), Proceedings of the Ninth International Conference on
Machine Learning, Aberdeen: Morgan Kaufmann, 1992.
Singh & Bertsekas 1997
Singh S, and Bertsekas D., Reinforcement Learning for Dynamic Channel
Allocation in Cellular Telephone Systems . Proceedings of NIPS, 1997.
Sutton 1988
Sutton, R., “Learning to Predict by the Methods of Temporal Differences,”
Machine Learning, 3:9-44, 1988.
Sutton & Barto 1998
Sutton, R., and Barto, A., Reinforcement Learning: An Introduction, Cambridge,
MA: MIT Press, 1998.
Tesauro 1995
Tesauro, G., “Temporal-Difference Learning and TD-Gammon,” Comm. ACM,
38(3):58-68, March, 1995.
Urbancic & Bratko 1994
Urbancic, T., and Bratko, I., “Reconstructing Human Skill with Machine
Learning,” in A. Cohn (ed.), Proceedings of the 11th European Conference on Artificial
Intelligence, John Wiley & Sons, 1994.
Watkins 1989
Watkins, C. J. C. H., Learning from Delayed Rewards, PhD thesis, Cambridge
University, Cambridge, England, 1989.
Willeke 1998
Willeke, T., “Learning Robot Behaviors with TR Trees,” unpublished memo,
Robotics Laboratory, Department of Computer Science, Stanford University, May 19,
1998.
30