The document describes Q-learning, an algorithm for reinforcement learning. Q-learning allows an agent to learn through trial-and-error interactions with its environment without relying on training data. The algorithm works by the agent storing the quality (Q) of taking actions in a Q-table, which is updated each time an action is taken. The agent's goal is to learn which actions yield the maximum reward by finding the optimal policy that maximizes long-term rewards. The document provides an example of using Q-learning to train a robot to navigate a maze by updating the Q-table after each action based on the reward received.
Encoding Robotic Sensor States for Q-Learning using the butest
The document discusses using a self-organizing map (SOM) to discretize continuous sensor states for reinforcement learning with a Lego Mindstorms NXT robot. Experiments apply Q-learning with different state representations - with and without the SOM. The SOM helped induce smoother representations of sonar values and bump sensors, leading to better robot behavior in some cases compared to simple quantization of sensor values.
Simple machines-gears, levers, pulleys, wheel and axleDavid Owino
This document discusses simple machines and their characteristics. It defines simple machines as devices that make work easier by transmitting force through a mechanical advantage. Examples of simple machines discussed include levers, pulleys, gears, inclined planes, screws, and the wheel and axle. Key terms like effort, load, mechanical advantage, velocity ratio, and efficiency are defined and the relationships between them are explained mathematically. Several examples and quiz questions are provided to illustrate concepts.
This document provides information about simple machines. It discusses different types of simple machines like the wheel and axle, screw, lever, pulley, and their uses. Simple machines make work easier by changing the amount or direction of force. They allow us to lift heavy loads using less effort. Common examples mentioned are a screwdriver and wrench, which act as a wheel and axle to make unscrewing easier. The document also covers compound machines, lifting machines, and defines terms related to simple machines like mechanical advantage and efficiency.
1. The document discusses floating-base manipulators and their application to underwater and aerial autonomy.
2. It presents an approach using null-space based behavioral control to coordinate multiple tasks for redundant robotic systems like manipulators attached to underwater or aerial vehicles.
3. Examples are provided of applying this approach to coordinate the end-effector pose, vehicle orientation, arm manipulability, and other tasks for experimental systems involving underwater and aerial robots.
This document discusses dynamics and motion using normal-tangential and cylindrical coordinate systems. It includes:
1) Explanations of normal-tangential coordinates and how to set up and solve dynamics problems using these coordinates. Equations of motion are expressed in normal and tangential directions.
2) An example problem solving for reaction forces on a boy on a rotating amusement park ride using normal-tangential coordinates.
3) An introduction to cylindrical coordinates and how dynamics problems can be analyzed using these coordinates, with equilibrium equations expressed in r, θ, and z directions.
4) Details on determining tangential and normal forces when an object moves along a curved path defined by r=f(θ).
Current research activities in marine robotics at the Italian interuniversity...Gianluca Antonelli
The document summarizes current research activities in marine robotics at the Italian interuniversity center ISME. Key projects discussed include SUNRISE/BRUCE which investigates using AUVs for underwater communication, WIMUST which enables distributed acoustic sensor arrays, and MARIS, DexROV and ROBUST which focus on underwater manipulation using robotic arms. ISME conducts applied research and field testing, collaborating with universities and industries. Projects aim to develop autonomous capabilities for tasks like object manipulation while avoiding obstacles.
This document provides an overview of planar kinetics of rigid bodies, including:
- Mass moment of inertia, equations of motion for translation and rotation, and applications involving flywheels, cranks, and general plane motion.
- Procedures for determining mass moment of inertia using integration, the parallel axis theorem, and methods for composite bodies.
- Examples are provided to demonstrate calculating mass moment of inertia and solving kinetics problems using the equations of motion for translation.
Encoding Robotic Sensor States for Q-Learning using the butest
The document discusses using a self-organizing map (SOM) to discretize continuous sensor states for reinforcement learning with a Lego Mindstorms NXT robot. Experiments apply Q-learning with different state representations - with and without the SOM. The SOM helped induce smoother representations of sonar values and bump sensors, leading to better robot behavior in some cases compared to simple quantization of sensor values.
Simple machines-gears, levers, pulleys, wheel and axleDavid Owino
This document discusses simple machines and their characteristics. It defines simple machines as devices that make work easier by transmitting force through a mechanical advantage. Examples of simple machines discussed include levers, pulleys, gears, inclined planes, screws, and the wheel and axle. Key terms like effort, load, mechanical advantage, velocity ratio, and efficiency are defined and the relationships between them are explained mathematically. Several examples and quiz questions are provided to illustrate concepts.
This document provides information about simple machines. It discusses different types of simple machines like the wheel and axle, screw, lever, pulley, and their uses. Simple machines make work easier by changing the amount or direction of force. They allow us to lift heavy loads using less effort. Common examples mentioned are a screwdriver and wrench, which act as a wheel and axle to make unscrewing easier. The document also covers compound machines, lifting machines, and defines terms related to simple machines like mechanical advantage and efficiency.
1. The document discusses floating-base manipulators and their application to underwater and aerial autonomy.
2. It presents an approach using null-space based behavioral control to coordinate multiple tasks for redundant robotic systems like manipulators attached to underwater or aerial vehicles.
3. Examples are provided of applying this approach to coordinate the end-effector pose, vehicle orientation, arm manipulability, and other tasks for experimental systems involving underwater and aerial robots.
This document discusses dynamics and motion using normal-tangential and cylindrical coordinate systems. It includes:
1) Explanations of normal-tangential coordinates and how to set up and solve dynamics problems using these coordinates. Equations of motion are expressed in normal and tangential directions.
2) An example problem solving for reaction forces on a boy on a rotating amusement park ride using normal-tangential coordinates.
3) An introduction to cylindrical coordinates and how dynamics problems can be analyzed using these coordinates, with equilibrium equations expressed in r, θ, and z directions.
4) Details on determining tangential and normal forces when an object moves along a curved path defined by r=f(θ).
Current research activities in marine robotics at the Italian interuniversity...Gianluca Antonelli
The document summarizes current research activities in marine robotics at the Italian interuniversity center ISME. Key projects discussed include SUNRISE/BRUCE which investigates using AUVs for underwater communication, WIMUST which enables distributed acoustic sensor arrays, and MARIS, DexROV and ROBUST which focus on underwater manipulation using robotic arms. ISME conducts applied research and field testing, collaborating with universities and industries. Projects aim to develop autonomous capabilities for tasks like object manipulation while avoiding obstacles.
This document provides an overview of planar kinetics of rigid bodies, including:
- Mass moment of inertia, equations of motion for translation and rotation, and applications involving flywheels, cranks, and general plane motion.
- Procedures for determining mass moment of inertia using integration, the parallel axis theorem, and methods for composite bodies.
- Examples are provided to demonstrate calculating mass moment of inertia and solving kinetics problems using the equations of motion for translation.
Neural Networks are another type of Artificial Intelligence used in computing. They are used in computer games, expert systems and at many more places.
A in a limited way you are able to use them on Arduino too - eg. to steer an Arduino robot! In my presentation I will explain more about this topic.
This slides were presented on my presentation "Arduino, roboti a neurální sítě" at Czech Arduino Day 2015 on BarCamp Plzeň (more info at https://plzenskybarcamp.cz/2015/arduino-day) #ArduinoD15 #Arduino #barCampCZ
Bon Jovi es una banda estadounidense de hard rock formada en 1983 por Jon Bon Jovi. Originalmente se consideraban una banda de glam metal característica de los 80, aunque ellos se consideran una banda de rock and roll. En los 90 cambiaron su sonido para alejarse del glam metal y tuvieron gran éxito con el álbum Keep the Faith en 1992. Han vendido más de 130 millones de álbumes y han actuado en importantes eventos y recintos alrededor del mundo.
Multi-Agent Systems (MAS) is one type of Artificial Intelligence used in computing. In this presentation I am explaining how they can be used to control Arduino Robot.
This slides were presented on my presentation "Multi-Agentní Systémy - vybudujme si populaci na stole či v kapse!" at DevFest 2014 (more info at http://devfest.cz/program/)
Video from my lecture (in czech language) is accessible here: https://youtu.be/JIGxJtDX2fA?list=PLcyrRW-49oISXNKAbmTu2hd19QPzvvNVE
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
1. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The agent learns a policy for how to act by maximizing rewards.
2. The document outlines key elements of reinforcement learning including states, actions, rewards, value functions, and explores different methods for solving reinforcement learning problems including dynamic programming, Monte Carlo methods, and temporal difference learning.
3. Temporal difference learning combines the advantages of Monte Carlo methods and dynamic programming by allowing for incremental learning through bootstrapping predictions like dynamic programming while also learning directly from experience like Monte Carlo methods.
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
The document provides an overview of reinforcement learning and artificial neural networks. It discusses key concepts in reinforcement learning including Markov decision processes, the Q-learning algorithm, temporal difference learning, and challenges in reinforcement learning like exploration vs exploitation. It also covers basics of artificial neural networks like linear and sigmoid units, backpropagation for training multi-layer networks, and applications of neural networks to problems like image recognition.
It described about MDP, Monte-Carlo, Time-Difference, sarsa, and q-learning method, and used for Reinforcement Learning study group's lecture, where is belonged to Korea Artificial Intelligence Laboratory.
This document discusses using imitation learning and DAgger for autonomous driving. It summarizes that:
1) Imitation learning uses expert demonstrations to learn a policy, which can improve sample efficiency over reinforcement learning. DAgger iteratively aggregates data from its own and expert policies to improve.
2) Experiments applying DAgger and reinforcement learning to pendulum swing-up and Atari Pong showed DAgger needed fewer episodes to converge than reinforcement learning.
3) Applying the methods to a car racing simulator showed DAgger worked well but the agent could not surpass the expert's performance, since the expert fails in some situations. Transfer learning also allowed improving driving skills across tracks.
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence.
The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.
Reinforcement learning is a machine learning technique where an agent learns how to behave in an environment by receiving rewards or punishments for its actions. The goal of the agent is to learn an optimal policy that maximizes long-term rewards. Reinforcement learning can be applied to problems like game playing, robot control, scheduling, and economic modeling. The reinforcement learning process involves an agent interacting with an environment to learn through trial-and-error using state, action, reward, and policy. Common algorithms include Q-learning which uses a Q-table to learn the optimal action-selection policy.
Reinforcement learning algorithms like Q-learning, SARSA, DQN, and A3C help agents learn optimal behaviors through trial-and-error interactions with an environment. Q-learning uses a model-free approach to estimate state-action values without a transition model. SARSA is similar to Q-learning but is on-policy, learning the value function from the current policy. DQN approximates Q-values using a neural network to handle large state spaces. A3C uses multiple asynchronous agents interacting with individual environments to learn diversified policies through an actor-critic framework.
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.
Abstract Summary:
Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.
This document provides an overview of reinforcement learning. It defines reinforcement learning as learning through trial-and-error to maximize rewards over time. The document discusses key reinforcement learning concepts like the agent-environment interaction, Markov decision processes, policies, value functions, and the Q-learning algorithm. It also provides examples of applying reinforcement learning to problems like career choices and the Atari Breakout video game.
Reinforcement learning is a computational approach for learning through interaction without an explicit teacher. An agent takes actions in various states and receives rewards, allowing it to learn relationships between situations and optimal actions. The goal is to learn a policy that maximizes long-term rewards by balancing exploitation of current knowledge with exploration of new actions. Methods like Q-learning use value function approximation and experience replay in deep neural networks to scale to complex problems with large state spaces like video games. Temporal difference learning combines the advantages of Monte Carlo and dynamic programming by bootstrapping values from current estimates rather than waiting for full episodes.
Dueling network architectures for deep reinforcement learningTaehoon Kim
1. The document proposes a dueling network architecture for deep reinforcement learning that separately estimates state value and state-dependent action advantages without extra supervision.
2. It introduces a dueling deep Q-network that uses a single network with two streams - one that produces a state value and the other that produces state-dependent action advantages, which are then combined to estimate the state-action value function.
3. Experiments on Atari games show that the dueling network outperforms traditional deep Q-networks, achieving better performance in both random starts and starts from human demonstrations.
Gradient Steepest method application on Griewank Function Imane Haf
The document discusses applying the gradient method to minimize the Griewank test function. It outlines the gradient method algorithm, shows simulation results for different starting points that converge in few iterations, and improvements made to ensure the algorithm finds the global minimum regardless of starting point by continuing to search after reaching local minima. The conclusions state the gradient method performs well locally but extensions were needed to locate the true global minimum for the Griewank function.
TensorFlow and Deep Learning Tips and TricksBen Ball
Presented at https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/241183195/ . Tips and Tricks for using Tensorflow with Deep Reinforcement Learning.
See our blog for more information at http://prediction-machines.com/blog/
Introduction: Asynchronous Methods for Deep Reinforcement LearningTakashi Nagata
The document introduces asynchronous reinforcement learning methods. It discusses standard reinforcement learning concepts like Markov decision processes, value functions, and Q-learning. It then presents the asynchronous advantage actor-critic (A3C) algorithm, which uses multiple asynchronous agents with shared parameters to improve stability. Experiments show A3C outperforms DQN on Atari games and car racing tasks, training faster without specialized hardware. A3C also scales well to multiple CPU cores and is robust to learning rate and initialization.
The document discusses deep robotics and its relationship to computer vision and deep learning. Deep robotics uses neural networks to process observations like images and directly output controls, allowing end-to-end training like in computer vision. This is analogous to traditional robotics which separates state estimation, planning and control into individual components. Deep reinforcement learning techniques like DQN can be applied to train neural network policies for robots. However, additional techniques like guided policy search that combine trajectory optimization and a global policy network may be needed for real-world robotic applications.
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
The link of the original article: https://ai.intel.com/demystifying-deep-reinforcement-learning/
This review summarizes:
How do I learn reinforcement learning?
Reinforcement Learning is Hot!
What is the RL?
General approach to model the RL problem
Maximize the total future reward
A function Q(s,a) = the maximum DFR
How to get Q-function?
Deep Q Network
Experience Replay
Exploration-Exploitation
This document discusses using deep reinforcement learning to develop a human-level control agent capable of playing a wide range of games. It describes a deep Q-network that uses a neural network to approximate the optimal action-value function and select actions. The key innovations are using experience replay to break correlations between training samples, and having a separate target network to compute stable Q-learning targets. The model was able to achieve human-level control on many Atari 2600 games from raw pixels.
Neural Networks are another type of Artificial Intelligence used in computing. They are used in computer games, expert systems and at many more places.
A in a limited way you are able to use them on Arduino too - eg. to steer an Arduino robot! In my presentation I will explain more about this topic.
This slides were presented on my presentation "Arduino, roboti a neurální sítě" at Czech Arduino Day 2015 on BarCamp Plzeň (more info at https://plzenskybarcamp.cz/2015/arduino-day) #ArduinoD15 #Arduino #barCampCZ
Bon Jovi es una banda estadounidense de hard rock formada en 1983 por Jon Bon Jovi. Originalmente se consideraban una banda de glam metal característica de los 80, aunque ellos se consideran una banda de rock and roll. En los 90 cambiaron su sonido para alejarse del glam metal y tuvieron gran éxito con el álbum Keep the Faith en 1992. Han vendido más de 130 millones de álbumes y han actuado en importantes eventos y recintos alrededor del mundo.
Multi-Agent Systems (MAS) is one type of Artificial Intelligence used in computing. In this presentation I am explaining how they can be used to control Arduino Robot.
This slides were presented on my presentation "Multi-Agentní Systémy - vybudujme si populaci na stole či v kapse!" at DevFest 2014 (more info at http://devfest.cz/program/)
Video from my lecture (in czech language) is accessible here: https://youtu.be/JIGxJtDX2fA?list=PLcyrRW-49oISXNKAbmTu2hd19QPzvvNVE
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
1. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The agent learns a policy for how to act by maximizing rewards.
2. The document outlines key elements of reinforcement learning including states, actions, rewards, value functions, and explores different methods for solving reinforcement learning problems including dynamic programming, Monte Carlo methods, and temporal difference learning.
3. Temporal difference learning combines the advantages of Monte Carlo methods and dynamic programming by allowing for incremental learning through bootstrapping predictions like dynamic programming while also learning directly from experience like Monte Carlo methods.
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
The document provides an overview of reinforcement learning and artificial neural networks. It discusses key concepts in reinforcement learning including Markov decision processes, the Q-learning algorithm, temporal difference learning, and challenges in reinforcement learning like exploration vs exploitation. It also covers basics of artificial neural networks like linear and sigmoid units, backpropagation for training multi-layer networks, and applications of neural networks to problems like image recognition.
It described about MDP, Monte-Carlo, Time-Difference, sarsa, and q-learning method, and used for Reinforcement Learning study group's lecture, where is belonged to Korea Artificial Intelligence Laboratory.
This document discusses using imitation learning and DAgger for autonomous driving. It summarizes that:
1) Imitation learning uses expert demonstrations to learn a policy, which can improve sample efficiency over reinforcement learning. DAgger iteratively aggregates data from its own and expert policies to improve.
2) Experiments applying DAgger and reinforcement learning to pendulum swing-up and Atari Pong showed DAgger needed fewer episodes to converge than reinforcement learning.
3) Applying the methods to a car racing simulator showed DAgger worked well but the agent could not surpass the expert's performance, since the expert fails in some situations. Transfer learning also allowed improving driving skills across tracks.
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence.
The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.
Reinforcement learning is a machine learning technique where an agent learns how to behave in an environment by receiving rewards or punishments for its actions. The goal of the agent is to learn an optimal policy that maximizes long-term rewards. Reinforcement learning can be applied to problems like game playing, robot control, scheduling, and economic modeling. The reinforcement learning process involves an agent interacting with an environment to learn through trial-and-error using state, action, reward, and policy. Common algorithms include Q-learning which uses a Q-table to learn the optimal action-selection policy.
Reinforcement learning algorithms like Q-learning, SARSA, DQN, and A3C help agents learn optimal behaviors through trial-and-error interactions with an environment. Q-learning uses a model-free approach to estimate state-action values without a transition model. SARSA is similar to Q-learning but is on-policy, learning the value function from the current policy. DQN approximates Q-values using a neural network to handle large state spaces. A3C uses multiple asynchronous agents interacting with individual environments to learn diversified policies through an actor-critic framework.
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.
Abstract Summary:
Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.
This document provides an overview of reinforcement learning. It defines reinforcement learning as learning through trial-and-error to maximize rewards over time. The document discusses key reinforcement learning concepts like the agent-environment interaction, Markov decision processes, policies, value functions, and the Q-learning algorithm. It also provides examples of applying reinforcement learning to problems like career choices and the Atari Breakout video game.
Reinforcement learning is a computational approach for learning through interaction without an explicit teacher. An agent takes actions in various states and receives rewards, allowing it to learn relationships between situations and optimal actions. The goal is to learn a policy that maximizes long-term rewards by balancing exploitation of current knowledge with exploration of new actions. Methods like Q-learning use value function approximation and experience replay in deep neural networks to scale to complex problems with large state spaces like video games. Temporal difference learning combines the advantages of Monte Carlo and dynamic programming by bootstrapping values from current estimates rather than waiting for full episodes.
Dueling network architectures for deep reinforcement learningTaehoon Kim
1. The document proposes a dueling network architecture for deep reinforcement learning that separately estimates state value and state-dependent action advantages without extra supervision.
2. It introduces a dueling deep Q-network that uses a single network with two streams - one that produces a state value and the other that produces state-dependent action advantages, which are then combined to estimate the state-action value function.
3. Experiments on Atari games show that the dueling network outperforms traditional deep Q-networks, achieving better performance in both random starts and starts from human demonstrations.
Gradient Steepest method application on Griewank Function Imane Haf
The document discusses applying the gradient method to minimize the Griewank test function. It outlines the gradient method algorithm, shows simulation results for different starting points that converge in few iterations, and improvements made to ensure the algorithm finds the global minimum regardless of starting point by continuing to search after reaching local minima. The conclusions state the gradient method performs well locally but extensions were needed to locate the true global minimum for the Griewank function.
TensorFlow and Deep Learning Tips and TricksBen Ball
Presented at https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/241183195/ . Tips and Tricks for using Tensorflow with Deep Reinforcement Learning.
See our blog for more information at http://prediction-machines.com/blog/
Introduction: Asynchronous Methods for Deep Reinforcement LearningTakashi Nagata
The document introduces asynchronous reinforcement learning methods. It discusses standard reinforcement learning concepts like Markov decision processes, value functions, and Q-learning. It then presents the asynchronous advantage actor-critic (A3C) algorithm, which uses multiple asynchronous agents with shared parameters to improve stability. Experiments show A3C outperforms DQN on Atari games and car racing tasks, training faster without specialized hardware. A3C also scales well to multiple CPU cores and is robust to learning rate and initialization.
The document discusses deep robotics and its relationship to computer vision and deep learning. Deep robotics uses neural networks to process observations like images and directly output controls, allowing end-to-end training like in computer vision. This is analogous to traditional robotics which separates state estimation, planning and control into individual components. Deep reinforcement learning techniques like DQN can be applied to train neural network policies for robots. However, additional techniques like guided policy search that combine trajectory optimization and a global policy network may be needed for real-world robotic applications.
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
The link of the original article: https://ai.intel.com/demystifying-deep-reinforcement-learning/
This review summarizes:
How do I learn reinforcement learning?
Reinforcement Learning is Hot!
What is the RL?
General approach to model the RL problem
Maximize the total future reward
A function Q(s,a) = the maximum DFR
How to get Q-function?
Deep Q Network
Experience Replay
Exploration-Exploitation
This document discusses using deep reinforcement learning to develop a human-level control agent capable of playing a wide range of games. It describes a deep Q-network that uses a neural network to approximate the optimal action-value function and select actions. The key innovations are using experience replay to break correlations between training samples, and having a separate target network to compute stable Q-learning targets. The model was able to achieve human-level control on many Atari 2600 games from raw pixels.
This document provides an overview of reinforcement learning. It discusses the reinforcement learning framework including actors like agents, environments, states, actions, rewards, and policies. It also summarizes several common reinforcement learning methods including value-based methods, policy-based methods, and model-based methods. Value-based methods estimate value functions using algorithms like Q-learning and deep Q-networks. Policy-based methods directly learn policies using policy gradient algorithms like REINFORCE. Model-based methods learn models of the environment and then plan based on these models.
Title: "Understanding PyTorch: PyTorch in Image Processing". Github: https://github.com/azarnyx/PyData_Meetup. The Dataset: https://goo.gl/CWmLWD.
The talk was given in PyData Meetup which took place in Munich on 06.03.2019 in Data Reply office. The talk was given by Dmitrii Azarnykh, data scientist in Data Reply.
AWS Finland Meetup June 2019 - DeepRacer storyJouni Luoma
Jouni Luoma won the DeepRacer league competition at the Stockholm AWS Summit with a time of 8.7 seconds. DeepRacer is an AWS service that uses reinforcement learning to train a neural network model to control a 1/18th scale race car in a 3D racing simulator. Competitors build and train models in the simulator and can also race the physical cars at AWS Summits. While Jouni's initial models had lap times around 45 seconds in simulation, transitioning to the real track presented challenges due to differences in visual cues. With additional training of new models both in simulation and on the real track, Jouni was able to achieve the winning 8.7 second lap time.
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...IRJET Journal
The document describes and compares supervised and reinforcement learning models for self-driving cars. It discusses:
1) A supervised learning model that predicts steering angle based on camera images using a convolutional neural network trained on human-driven data. Accuracy improved from 20% to 85% with more training data.
2) A reinforcement learning model that trains an agent car to navigate an environment and avoid obstacles using rewards and Q-learning. The car learns to optimize its path in real-time as new barriers are added.
3) The key differences are that the supervised model only considers the car's motion, while the reinforcement model also accounts for external obstacles by giving rewards/penalties. The reinforcement model requires no explicit data
2. REINFORCEMENT
• No training data
• Exploration
• Reward is provided for each
action
• Positive or negative
SUPERVISED
• Sufficient training data
• Beware of overfitting
• Need a teacher or Database
REINFORCEMENT VS SUPERVISED
5. Q-LEARNING
• Continuously iterates over the state space until convergence
is reached
• Q table which stores results of feedback
• Initialized to all zeros or random values
• Reward table which stores positive and negative feedback
• Q(s, a) = Q(s,a) + r(s,a) + λ (maxaction Q(s’,a’))
• s = state, a = action, r = reward, λ = learning discount
factor
• Q(s, a) = Q(s,a) + alpha(r(s,a)) + λ (maxaction Q(s’,a’)))
• Two options
• Randomly choose an action – USE TO TRAIN (encourages exploration)
• Choose action with highest Q value – USE AFTER TRAINING
6. FINAL PROJECT GOALS
• Train robot to drive in circle – complete
• Train the robot to navigate any maze AND drive into a “box”
– complete
7. PROJECT CONSTRAINTS
• No GPS, so I initialize the state as NORTH, no matter what direction the car
is actually pointing
• Reward is initialized as shown below:
• 0 if car turns EAST or WEST FROM NORTH or SOUTH
• -10 if car turns SOUTH FROM EAST or WEST – I want the car to move forward (North)
• 10 if car turns NORTH from EAST or WEST
• I don’t allow the car to reverse
• Need more sensors to ensure the car turned correctly.
• I built a loop into the robot to wait for my feedback from a remote control to tell it to turn a bit further.
• Robot learns from my feedback and it adjusts the turn time
• Needed an accelerometer to detect changes in rotation about x-axis OR GPS so
that it knows the current direction
8. Q-LEARNING AGENT - SUMMARY
• Arduino loops forever
• Train the car with the Q-Learning algorithm and randomly
choosing the direction (5X) – creates the learning policy
• After training complete, execute the “learned policy.”
• It SHOULD follow the optimal path every time, but it may not (Why?)
• Chooses the direction based on the max Q value
• I have to main functions
• QLearningAgent – training function which selects values randomly
• QLearningAgentSelectFromQ – Selects the direction with the highest Q
value
9. VIDEOS
• Q-Learning – After training
• Training 1
• Training 1/2
• Training 2
• Training 3
• Training 4
• Training 5
• Driving in circle
10. CONCLUSION
• Q-learning is a light-weight unsupervised learning algorithm
that can be used to train robots without any training data.
• Need a Wifi shield in order to inspect the Q-table and how the
agent makes decisions while navigating the maze.
• Need to add a GPS for the agent to know its current direction
11. APPENDIX A
• Q-Learning Algorithm
• Q-Learning Agent Code
• Picture of Maze
• Pictures of robotic car
12. Q-LEARNING ALGORITHM
For all s Є S, a Є A
Q(s,a) = 0
Repeat
Select state s randomly or by max()
Repeat
select an action a and carry it out
Obtain reward r and new state s’
Q(s,a) = Q(s,a) + r(s,a) + λmaxaction Q(s’,a’)
s = s’
Until s is an ending state or time limit reached
Until Q converges
13. Q-LEARNING AGENT
int QLearningAgent(int currentState){
selectAction(currentState);
while(!goalState()){
int action = selectAction(currentState);
int newState = takeAction(currentState, action);
int reward = R[currentState][action];
int maxQ = getMaxQ(newState);
Q[currentState][action] = Q[currentState][action]
+ (r + (gammaQLearning * maxQ));
currentState = newState;
delay(1000);
}
delay(10000);
14.
15.
16. HARDWARE/SOFTWARE
• Arduino Uno Rev 3
• Parallax Ping (28015) – Distance sensor
• Adafruit Motor Shield Rev. 2.3 (I needed to solder the pins which was very
difficult)
• Vilros Micro Servo 99 (SG90) – Spins the sensor in different directions
• Multi-colored LED light – used to signal when training is complete
• 8 AA batteries to power motor shield
• 1 9V battery to power Arduino
• Infrared receiver – to receive signal from remote control
• Remote control - any remote control will do
• 4 wheel robotic smart car chasis
• Ardunio IDE v 1.0.6 (older version)
• Windows 8 OS
17. REWARD TABLEvoid initR(){
R[0][0] = 0; //N > N
R[0][1] = -1; //N > S
R[0][2] = 0; //N > E
R[0][3] = 0; //N > W
R[1][0] = -1; //S > N
R[1][1] = -1; //S > S
R[1][2] = 0; //S > E
R[1][3] = 0; //S > W
R[2][0] = 10; //E > N
R[2][1] = -10;//E > S
R[2][2] = -1; //E > E
R[2][3] = -1; //E > W
R[3][0] = 10; //W > N
R[3][1] = -10;//W > S
R[3][2] = -1; //W > E
R[3][3] = -1; //W > W
}
18. APPENDIX B - INTRODUCTION TO
ARTIFICIAL INTELLIGENCE, WOLFGANG
ERTEL
• This is from previous AI project, but it is a good example of
how the A-Learning Algorithm was used to train a robot to
crawl.
19. ROBOTICS EXAMPLE – CAN IT LEARN TO
CRAWL FORWARD WITH Q-LEARNING?
Introduction to Artificial Intelligence, Wolfgang Ertel
G
x
Gx Left Right
Gy
Up
Down
Gy
20. Q-LEARNING TABLE
Reward Table
Q Learning Table initial state
Gx Left Right
Gy
Up
Down
1
Up, Left Up, Right Down,
right
Down, left
Up, Left - 0 - 0
Up, Right 0 - 0 -
Down, right - 0 - 0
Down, left 0 - 0 -
21. Q-LEARNING IMPLEMENTED
Up, Left Up, Right Down,
right
Down, left
Up, Left - 124,942 - 124,974
Up, Right 249,178 - 249,940 -
Down, right - 124,621 - 250,932
Down, left 250,642 - 250,238 -
Down,right -> down, left => Reward = 1, everything else is 0
1,000,000 iterations
Up, Left Up, Right Down,
right
Down, left
Up, Left - 250,338 - 127,794
Up, Right 250,110 - 250,516 -
Down, right - 125,144 - 249,789
Down, left 249,810 - 249,564 -
Down, Right -> Down, Left => Reward = 1, Up, Left->Up, Right => 1, everything else
1,000,000 iterations
22. APPENDIX C – PREVIOUS AI PROJECT
LEFT TURN, RIGHT TURN, STOP TRAINING
• This was also from a previous AI project. I was able to train my
robotic car to stop, turn left, and turn right.
• I included it since it has videos.
23. STOP TRAINING
This was from a previous project. This training is not used in the current project.
int updateQStop(){
int r = RStop();
int d = distanceToStopGoal();
distanceToStop = distanceToStop
+ (reward * gammaStop *
distanceToWall);
}
• Reward is -1 when the stop distance is greater than the goal distance
• Reward is 1 otherwise
• http://youtu.be/n8eJrbDiP3A
24. LEFT TURN TRAINING
• This was from a previous project. This training is not used in the
current project.
• int r = RLeft(startDistance, endMinusStartDistance);
• turnLeftTime = turnLeftTime + (reward * gammaLeftTurn *
abs(endMinusStartDistance));
• http://youtu.be/pISBHxOMTSU
Reward Description
-1 When car is further away
from the wall
20 If car is too close to the
wall
0 When the car moves
nearly parallel to the wall
25. RIGHT TURN TRAINING
• This was from a previous project. This training is not used in the current project.
• double r = RRight(distance);
• long d = distanceToRightTurnGoal(distance);
• turnRightTime = turnRightTime + (r * gammaRightTurn * d);
• http://youtu.be/2ubkShxqizo
Reward Description
-1 When car is further away
from the wall
20 If car is too close to the
wall
0 When the car moves
nearly parallel to the wall
Editor's Notes
Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.
Introduce each of the major topics.
To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
Give a brief overview of the presentation. Describe the major focus of the presentation and why it is important.
Introduce each of the major topics.
To provide a road map for the audience, you can repeat this Overview slide throughout the presentation, highlighting the particular topic you will discuss next.
π = policy
π = policy
Can this robot learn to crawl forward?
Movements to the right are rewarded with positive values.
Movements to the left are punished with negative values.