The document introduces reinforcement learning approaches for autonomous driving. It discusses modeling the driving problem as a Markov decision process and using deep reinforcement learning methods like DQN. The key challenges are partial observability of the environment and designing an effective reward function. DQN trains a neural network to approximate the action-value function by experiencing simulated driving scenarios. Experience replay, a target network, and reward scaling help stabilize and improve training. The hyperparameters from seminal DQN research are also presented.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
In real-world scenarios, decision making can be a very challenging task even for modern computers. Generalized reinforcement learning (GRL) was developed to facilitate complex decision making in highly dynamical systems through flexible policy generalization mechanisms using kernel-based methods. GRL combines the use of sampling, kernel functions, stochastic process, non-parametric regression and functional clustering.
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
Ā
A summary of Chapter 4: Dynamic Programming of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Reinforcement learningļ¼policy gradient (part 1)Bean Yen
Ā
The policy gradient theorem is from "Reinforcement Learning : An Introduction". DPG and DDPG is from the original paper.
original link https://docs.google.com/presentation/d/1I3QqfY6h2Pb0a-KEIbKy6v5NuZtnTMLN16Fl-IuNtUo/edit?usp=sharing
Deep Reinforcement Learning and Its ApplicationsBill Liu
Ā
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee
Ā
A summary of Chapter 3: Finite Markov Decision Processes of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Slides from my presentation of Richard Sutton and Andrew Barto's "Introduction to Reinforcement Learning Chapter 1"
Video (https://www.youtube.com/watch?v=4SLGEq_HZxk&t=2s)
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
Ā
This slide reviews deep reinforcement learning, specially Q-Learning and its variants. We introduce Bellman operator and approximate it with deep neural network. Last but not least, we review the classical paper: DeepMind Atari Game beats human performance. Also, some tips of stabilizing DQN are included.
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMichael Bosello
Ā
The presentation of the paper "Robot Drivers: Learning to Drive by Trial & Error" at MSN 2019.
Software:
https://github.com/MichaelBosello/Self-Driving-Car
Demo:
https://www.youtube.com/watch?v=pAwzMXldfss&list=PL2TKpIF3IShA-fETi5bVuHD17Ww55TU11
Introductory course on concepts used in predictive control. For more files and MATLAB suporting information go to:
http://controleducation.group.shef.ac.uk/OER_index.htm
In real-world scenarios, decision making can be a very challenging task even for modern computers. Generalized reinforcement learning (GRL) was developed to facilitate complex decision making in highly dynamical systems through flexible policy generalization mechanisms using kernel-based methods. GRL combines the use of sampling, kernel functions, stochastic process, non-parametric regression and functional clustering.
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
Ā
A summary of Chapter 4: Dynamic Programming of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Reinforcement learningļ¼policy gradient (part 1)Bean Yen
Ā
The policy gradient theorem is from "Reinforcement Learning : An Introduction". DPG and DDPG is from the original paper.
original link https://docs.google.com/presentation/d/1I3QqfY6h2Pb0a-KEIbKy6v5NuZtnTMLN16Fl-IuNtUo/edit?usp=sharing
Deep Reinforcement Learning and Its ApplicationsBill Liu
Ā
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee
Ā
A summary of Chapter 3: Finite Markov Decision Processes of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Slides from my presentation of Richard Sutton and Andrew Barto's "Introduction to Reinforcement Learning Chapter 1"
Video (https://www.youtube.com/watch?v=4SLGEq_HZxk&t=2s)
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
Ā
This slide reviews deep reinforcement learning, specially Q-Learning and its variants. We introduce Bellman operator and approximate it with deep neural network. Last but not least, we review the classical paper: DeepMind Atari Game beats human performance. Also, some tips of stabilizing DQN are included.
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMichael Bosello
Ā
The presentation of the paper "Robot Drivers: Learning to Drive by Trial & Error" at MSN 2019.
Software:
https://github.com/MichaelBosello/Self-Driving-Car
Demo:
https://www.youtube.com/watch?v=pAwzMXldfss&list=PL2TKpIF3IShA-fETi5bVuHD17Ww55TU11
Introductory course on concepts used in predictive control. For more files and MATLAB suporting information go to:
http://controleducation.group.shef.ac.uk/OER_index.htm
Planning in Markov Stochastic Task DomainsWaqas Tariq
Ā
In decision theoretic planning, a challenge for Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) is, many problem domains contain big state spaces and complex tasks, which will result in poor solution performance. We develop a task analysis and modeling (TAM) approach, in which the (PO)MDP model is separated into a task view and an action view. In the task view, TAM models the problem domain using a task equivalence model, with task-dependent abstract states and observations. We provide a learning algorithm to obtain the parameter values of task equivalence models. We present three typical examples to explain the TAM approach. Experimental results indicate our approach can greatly improve the computational capacity of task planning in Markov stochastic domains.
Towards Reinforcement Learning-based Aggregate ComputingGianluca Aguzzi
Ā
Towards Reinforcement Learning-based Aggregate Computing is a paper presented @ COORDINATION 2022.
These slides show the main concepts of Aggregate Computing and discuss how it could be integrated with Learning techniques.
Particularly, in this work, we explore the process of programming sketching: the program is fully defined but a part of it is unknown. Then the learning is applied there in order to "fill" the missing "hole".
Paper: https://link-springer-com.ezproxy.unibo.it/chapter/10.1007/978-3-031-08143-9_5
Code: https://github.com/cric96/experiment-2022-coordination
The Intentional Analytics Model (IAM) has been devised to couple OLAP and analytics by (i) letting users express their analysis intentions on multidimensional data cubes and (ii) returning enhanced cubes, i.e., multidimensional data annotated with knowledge insights in the form of models (e.g., correlations). Five intention operators were proposed to this end; of these, describe and assess have been investigated in previous papers. In this work we enrich the IAM picture by focusing on the explain operator, whose goal is to provide an answer to the user asking "why does a measure show these values?". Specifically, we propose a syntax for the operator and discuss how enhanced cubes are built by (i) finding the polynomials that best approximate the relationship between a measure and the other cube measures, and (ii) highlighting the most interesting one. Finally, we test the operator implementation in terms of efficiency.
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docxfelicidaddinwoodie
Ā
1
PPA 670
Public Policy Analysis
Basic Policy Terms and Concepts
Essential Definitions
ļ¬ POLICY
ļ¬ Lasswell & Kaplan: Policy is a projected program
of goals, values and practices.
ļ¬ Thomas Dye: Whatever government chooses to do
or not do.
ļ¬ Charles Jones: Functional analytic category -- a
course of action rather than specific decisions.
Essential Definitions
PUBLIC POLICY
ļ¬ Definition: "Purposive action by actors
acting in public institutions to produce
direction in government.
ļ¬ Key terms:
Purposive action
Acting in public institutions
Direction in government
2
Approaches to Policy Analysis
The Eight-fold path:
1. Define the problem
2. Assemble some evidence
3. Construct the alternatives
4. Select the criteria
5. Project the outcomes
6. Confront the trade-offs
7. Decide
8. Tell your story
The Process Model I.D.
Recognize
Structure
Agendize
Prob. State.
Alt. Det.
Alt. TestDecision
Implement
Monitor
Evaluate
Verification
Feedback
Iteration
Public Policy Environment
ļ¬ Key Institutions:
ļ· Chief Executive
ļ· Bureaucracy
ļ· Legislature
ļ· Courts
ļ· Interest Groups & Lobbyists
ļ· Media
3
Public Policy Environment
Key Individuals:
ļ· President, governors, city managers,
mayors
ļ· Senior policy-makers (eg.
Department heads)
ļ· Key legislators (eg. Speaker)
ļ· Key lobbyists (eg. Nader)
ļ· Media stars
Public Policy Environment
ļ¬ Key feature: Policy is a product of public
institutions, must be legitimized.
ļ¬ Policy without legitimization is just rhetoric.
Ethical Issues:
ļ¬ Roles of ideology and
ļ¬ objectivity of key individuals.
ļ¬ Recognizing and serving the āPublic
Goodā
4
Why PUBLIC Policy?
ļ¬ Political reasons
ļ¬ Moral or Ethical reasons
ļ¬ Economic and Market Failures
PROBLEM RECOGNITION
ļ¬ Key to the start of the analytical
process. If the wrong problem is
identified, the quality of analysis
is moot.
ļ¬ Public policy problems arise as a
result of change or pressure for
change.
I.D.
Recognize
Structure
Agendize
Prob. State.
Alt. Det.
Alt. TestDecision
Implement
Monitor
Evaluate
Problem
Recognition
Phase
5
Recognizing Public Policy
Problems
ļ¬ Public vs. private problems
ļ¬ Policy vs. management problems
ļ¬ Solvable vs. Unsolvable problems
If we fail to properly recognize a public
policy issue, we cannot hope to solve it:
G.I.G.O.
Problem Recognition Criteria
Asking proper questions is critical to
establishing the correct criteria:
Four key questions:
ļ¬ Where did the problem come from?
ļ¬ How do you know about the problem?
ļ¬ What are the dimensions of the problem?
ļ¬ Who is involved and why?
Where did the problem come from?
ļ¬ What is the history?
ļ¬ Have we seen this problem before?
ļ¬ What do we know now?
ļ¬ What do we need to know to solve the problem?
6
How do you know about the problem?
ļ¬ How did this problem come to public
awareness?
ļ¬ What are your sources of information?
ļ¬ Facts
ļ¬ Opinions
ļ¬ Primary and secondary data
ļ¬ Who do you trust? ...
The presentation that was held at GECCO 2011 about the "Demolition Derby" competition. In the competition, separate programs control the cars in a racing game being provided with local, sensor information only. The goal is to cause as much damage to other cars with avoiding damage meanwhile.
The slides shortly introduce the controllers of the participants and then present the results and the winner.
See also http://www.youtube.com/watch?v=YZ1Pi7V83Ao
In Machine Learning in Credit Risk Modeling, we provide an explanation of the main Machine Learning models used in James so that Efficiency does not come at the expense of Explainability.
(Contact Yvan De Munck for more info or to receive other and future updates on the subject @yvandemunck or yvan@james.finance)
In this presentation, I briefly show challenges, approaches and the state of the art of "Collective" Learning, namely where a large number of agents try to maximise a policy following a collective good.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leberās hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendelās laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four Oāclock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Ā
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The systemās unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
Ā
As consumer awareness of health and wellness rises, the nutraceutical marketāwhich includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutritionāis growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Ā
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other Ā chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much ofĀ that energy is released. Ā
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules -Ā a chemical called pyruvate. A small amount of ATP is formed during this process.Ā
Most healthy cells continue the breakdownĀ inĀ a second process, called the Kreb's cycle. The Kreb's cycleĀ allows cells to āburnā the pyruvates made in glycolysisĀ to get more ATP.
The last step in the breakdown of glucose is calledĀ oxidativeĀ phosphorylationĀ (Ox-Phos).
ItĀ takes place in specialized cell structuresĀ calledĀ mitochondria. This process produces a large amount of ATP. Ā Importantly,Ā cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis -Ā Kreb's -Ā oxidative phosphorylation), about 36Ā molecules of ATP are created, giving itĀ much moreĀ energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result,Ā cancer cells need to use a lot more sugar molecules to get enoughĀ energy to survive.Ā
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result,Ā cancer cells need to use a lot more sugar molecules to get enoughĀ energy to survive.Ā
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 ā 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : Ā cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Ā
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.Ā
Ā Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nucleic Acid-its structural and functional complexity.
Ā
Autonomous Driving and Reinforcement Learning - an Introduction
1. Autonomous Driving and Reinforcement Learning ā an Introduction
Michael Bosello
michael.bosello@studio.unibo.it
Universit`a di Bologna ā Department of Computer Science and Engineering, Cesena, Italy
Smart Vehicular Systems A.Y. 2019/2020
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 1 / 30
2. Outline
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 2 / 30
3. The Driving Problem
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 3 / 30
4. The Driving Problem
Self-Driving I
Goals
Reach the destination
Avoid dangerous states
Provide comfort to passengers (e.g. avoid oscillations, sudden moves)
Settings
Multi-agent problem
One have to consider others
Interaction is both
Cooperative, all the agents desire safety
Competitive, each agent has its own goal
Environment
Non-deterministic
Partially observable
Dynamic
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 4 / 30
5. The Driving Problem
Self-Driving II
Three sub-problems
Recognition Identify environmentās components
Prediction Predict the evolution of the surrounding
Decision making Take decisions and act to pursue the goal
Two approaches
Single task handling
Use human ingenuity to inject knowledge about the domain
End-to-end
Let the algorithm optimize towards the ļ¬nal goal without constrains
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 5 / 30
6. The Driving Problem
Self-Driving III
Figure: Modules of autonomous driving systems (source [Talpaert et al., 2019] - modiļ¬ed)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 6 / 30
7. The Driving Problem
Sensing & Control
Sensing
Which sensors?
Single/multiple
Sensor expense
Operating conditions
Control
Continuous ā steering, acceleration, braking
Discrete ā forward, right, left, brake
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 7 / 30
8. The Driving Problem Single task approach
Recognition
Convolutional Neural Networks (CNNs)
YOLO [Redmon et al., 2015]
Computer Vision techniques for object detection
Feature based
Local descriptor based
Mixed
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 8 / 30
9. The Driving Problem Single task approach
Prediction
Model-based
Physics-based
Maneuver-based
Interaction-aware
Data-driven
Recurrent NN (like Long-Short Term Memory LSTM)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 9 / 30
10. The Driving Problem Single task approach
Decision Making
Planning/Tree Search/Optimization
Agent-oriented programming (like the BDI model)
Deep Learning (DL)
Reinforcement Learning (RL)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 10 / 30
11. The Driving Problem End-to-end approach
End-to-end
Supervised Learning
Based on imitation
Current approach by major car manufacturer
Robotic drivers are expected to be perfect
Drawbacks
Training data Huge amounts of labeled data or human effort
Covering all possible driving scenarios is very hard
Unsupervised Learning
Learns behaviors by trial-and-error
like a young human driving student
Does not require explicit supervision from humans.
Mainly used on simulated environment
to avoid consequences in real-life
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 11 / 30
12. The Driving Problem Pitfalls of Simulations
Simulations
Keep in mind
An agent trained in a virtual environment will not perform well in the real setting
For cameras: different visual appearance, especially for textures
You need some kind of transfer learning or conversion of images/data
Always enable noises of sensors/actuators
Some āgrand truthā sensors present in some simulators arenāt available in real life
For example, in TORCS (a circuit simulator popular in the ML ļ¬eld) you can have exact
distances from other cars and from the borders. Too easy...
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 12 / 30
13. Reinforcement Learning Basic Concepts
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 13 / 30
14. Reinforcement Learning Basic Concepts Problem Deļ¬nition
Agent-environment interaction
In RL, an agent learns how to fulļ¬ll a task by interacting with its environment
The interaction between the agent and the environment can be reduced to:
State Every information about the environment useful to predict the future
Action What the agent do
Reward A real number that Indicates how well the agent is doing
Figure: At each time step t the agent perform an action At and receives the couple St+1, Rt+1
(source [Sutton and Barto, 2018])
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 14 / 30
15. Reinforcement Learning Basic Concepts Problem Deļ¬nition
Terminology
Episode
A complete run from a starting state to a ļ¬nal state
Episodic task A task that has an end
Continuing task A task that goes on without limit
We can split it e.g. by setting a maximum number of steps
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 15 / 30
16. Reinforcement Learning Basic Concepts Problem Deļ¬nition
Markov Decision Process (MDP)
A RL problem can be formalized as a MDP. The 4-tuple:
S set of states
As set of actions available in state s
P(s |s, a) state-transition probabilities (probability to reach s given s and a)
R(s, s ) reward distribution associate to state transitions
Figure: Example of a simple MDP (source Wikipedia)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 16 / 30
17. Reinforcement Learning Basic Concepts Problem Deļ¬nition
Environment features
MDPs can deal with non-deterministic environments
(because they have probability distributions)
But we need to cope with observability
Markov Property ā Full observability
Classical RL algorithms rely on the Markov property
A state satisfy the property if it includes information about all aspects of the (past)
agent-environment interaction that make a difference for the future
From states to observations ā Partial observability
Sometimes, an agent has only a partial view of the world state
The agent gets information (observations) about some aspects of the environment
Abusing the language, observations/history are called āstateā and symbolized St
How to deal with partial observability?
Approximation methods (e.g. NN) doesnāt rely on the Markov property
Sometimes we can reconstruct a Markov state using a history
Stacking the last n states
Using a RNN (e.g. LSTM)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 17 / 30
18. Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy I
We want the agent to learn a strategy i.e. a policy
Policy Ļ
A map from states to actions used by the agent to choose the next action
In particular, we are searching for the optimal policy
ā The policy that maximize the cumulative reward
Cumulative reward
The discounted sum of rewards over time
R =
n
t=0
Ī³t
rt+1 | 0 ā¤ Ī³ ā¤ 1
For larger values of Ī³, the agent focuses more about the long term rewards
For smaller values of Ī³, the agent focuses more about the immediate rewards
What does this mean?
The actions of the agent affect its possibilities later in time
It could be better to choose an action that led to a better state than an action that
gives you a greater reward but led to a worse state
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 18 / 30
19. Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy II
Problem
we donāt know the MDP model (state transition probabilities and reward distribution)
Solution
Monte Carlo approach: letās make estimations based on experience
A way to produce a policy is to estimate the value (or action-value) function
given the function, the agent needs only to perform the action with the greatest value
Expected return E[Gt]
The expectation of cumulative reward i.e. our current estimation
Value function VĻ(s) = E[Gt|St = s]
(state) ā expected return when starting in s and following Ļ thereafter
Action-value function QĻ(s, a) = E[Gt|St = s, At = a]
(state, action) ā expected return performing a and following Ļ thereafter
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 19 / 30
20. Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy III
SARSA and Q-learning
They are algorithms that approximate the action-value function
At every iteration the estimation is reļ¬ned thanks to the new experience.
Every time the evaluation becomes more precise, the policy gets closer to the
optimal one.
Generalized Policy Iteration (GPI)
Almost all RL methods could be described as GPI
They all have identiļ¬able policies and value functions
Learning consist of two interacting processes
Policy evaluation Compute the value function
Policy improvement Make the policy greedy
Figure: [Sutton and Barto, 2018]
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 20 / 30
21. Reinforcement Learning Basic Concepts Learning a Policy
Learning a policy IV
The exploration/exploitation dilemma
The agent seeks to learn action values of optimal behavior, but it needs to behave
non-optimally in order to explore all actions (to ļ¬nd the optimal actions)
Īµ-greedy policy: the agent behave greedily but there is a (small) Īµ probability to
select a random action.
On-policy learning
The agent learns action values of the policy that produces its own behavior (qĻ)
It learns about a near-optimal policy (Īµ-greedy)
Target policy and behavior policy correspond
Off-policy learning
The agent learns action values of the optimal policy (qā)
There are two policies: target and behavior
Every n steps the target policy is copied into the behavior one
If n = 1 there is only one action-value function
But itās not the same of on-policy (the target and behavior policy still not correspond)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 21 / 30
22. Reinforcement Learning Basic Concepts Learning a Policy
How to estimate the value function?
A table that associate state (or state-action pairs) to a value
Randomly initialized
Temporal Difference (TD) update
V (St) ā V (St) + Ī±[Rt+1 + Ī³V (St+1) ā V (St)]
TD error
Ī“t = Rt+1 + Ī³V (St+1) ā V (St)
Rt+1 + Ī³V (St+1) is a better estimation of V (St)
The actual reward obtained plus the expected rewards for the next state
By subtracting the old estimate we get the error made at time t
Sarsa (on-policy)
Q(St, At) ā Q(St, At) + Ī±[Rt+1 + Ī³Q(St+1, At+1) ā Q(St, At)]
Q-learning (off-policy)
Q(St, At) ā Q(St, At) + Ī±[Rt+1 + Ī³ maxa Q(St+1, a) ā Q(St, At)]
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 22 / 30
23. Example of a Driving Agent: DQN
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 23 / 30
24. Example of a Driving Agent: DQN Neural Network and Training Cycle
Deep Q-Network
DQN
When the number of states increases, it becomes impossible to use a table to
store the Q-function (alias for action-value function)
A neural network can approximate the Q-function
We also get better generalization and the ability to deal with non-Markovian envs
How to represent the action-value function in a NN?
1 Input: the state (as a vector) ā Output: the values of each action
2 Input: the state and an action ā Output: the value for that action
Design the driver agent
Environment: states and actions, reward function
Neural network
Training cycle
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 24 / 30
25. Example of a Driving Agent: DQN Neural Network and Training Cycle
Neural Network
Pre-processing
Itās important how we present the data to the network
Letās take images as an example
Resize to small images (e.g. 84x84)
If colors are not discriminatory, use grayscale
How many frames to stack?
Remember: the larger the input vector, the larger the searching space
Convolutional Neural Networks
A good idea for structured/spatially local data
e.g. images, lidar data ...
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 25 / 30
26. Example of a Driving Agent: DQN Neural Network and Training Cycle
Training Cycle
Transaction samples are used to compute the target update
Training use a gradient form of Q-learning
([Mnih et al., 2013] and [Sutton and Barto, 2018] for details)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 26 / 30
27. Example of a Driving Agent: DQN Issues and Solutions
DQN Issues and Solutions I
Each of these features signiļ¬cantly improve performance
Experience Replay
We put samples in a buffer and we take random batches from it for training
When full, oldest samples are discarded
Why? Experience replay is needed to break temporal relation
In supervised learning, inputs should be independent and identically distributed (i.i.d.)
i.e. the generative process have no memory of past generated samples
Other advantages:
More efļ¬cient use of experience (samples are used several times)
Avoid catastrophic forgetting (by presenting again old samples)
Double DQN
We have two networks, one behavior NN and one target NN
We train the target NN
Every N steps we copy the target NN into the behavior one
Why? To reduce target instability
Supervised learning to perform well requires that for the same input a label does not
change over time
In RL, the target change constantly (as we reļ¬ne the estimations)
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 27 / 30
28. Example of a Driving Agent: DQN Issues and Solutions
DQN Issues and Solutions II
Rewards scaling
Scaling rewards in the range [ā1; 1] dramatically improves efļ¬ciency [Mnih et al., 2013]
Frame skipping
Consequent frames are very similar ā we can skip frames without loosing much
information and speed up training by N times
The selected action is repeated for N frames
Initial sampling
It could be useful to save K samples in the replay buffer before start training
So many parameters...
[Mnih et al., 2015] is a good place to start
The table of hyperparameters used in their work is in the next slide
More recent works and experiments could provide better suggestions
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 28 / 30
29. Example of a Driving Agent: DQN Issues and Solutions
Hyperparameters used in [Mnih et al., 2015]
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 29 / 30
30. Example of a Driving Agent: DQN Issues and Solutions
Last things
Reward function
Designing a reward function is not an easy task
It could severely affect learning
In the case of a driving agent (like in RL for robotics), the rewards are internal and
doesnāt come from the environment
You should give rewards also for intermediate steps to direct the agent (i.e. not
only +1 when successful and ā1 when it fails)
Some naive hints
Reach target +1
Crash ā1
Car goes forward +0.4 (if too big it will inhibit crash penalty)
Car turns +0.1 (if too big it will keep circle around)
ā0.2 instead of the positive reward if the agent turns to a direction and just after it turns
to the opposite one (i.e. right-left or left-right) to reduce car oscillations
Exploration/exploitation
Start with a big Īµ
Gradually reduce it with an Īµ-decay (e.g. 0.999)
Īµt+1 = Īµt ā decay
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 30 / 30
32. Appendix: Reading Suggestions
Reading Suggestions: Access to Resources
Books and papers are available through the university proxy
Search engine: https://sba-unibo-it.ezproxy.unibo.it/AlmaStart
If you want to access an article by link
converts dots in scores and add .ezproxy.unibo.it at the end of the url e.g:
https://www.nature.com/articles/nature14236
ā
https://www-nature-com.ezproxy.unibo.it/articles/nature14236
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 32 / 30
33. Appendix: Reading Suggestions
Reading Suggestions: Driving Problem
Extended review of autonomous driving approaches [Leon and Gavrilescu, 2019]
Overview of RL for autonomous driving and its challenges [Talpaert et al., 2019]
Tutorial of RL with CARLA simulator https://pythonprogramming.net/
introduction-self-driving-autonomous-cars-carla-python
End-to-end approach examples
CNN to steering on a real car [Bojarski et al., 2016]
DQN car control [Yu et al., 2016]
Single task approach example
NN for recognition + RL for control [Li et al., 2018]
Converting virtual images to real one
[Pan et al., 2017]
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 33 / 30
34. Appendix: Reading Suggestions
Reading Suggestions: RL & DL
In depth (books)
[Sutton and Barto, 2018] is the bible of RL ā the second version (2018) contains
also recent breakthroughs
[Goodfellow et al., 2016] for Deep Learning
Compact introductions
Brieļ¬y explanation of RL basics and DQN: https://rubenfiszel.github.
io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN
CNN: https://cs231n.github.io/convolutional-networks/
Even more compact: https://pathmind.com/wiki/convolutional-network
DQN
[Mnih et al., 2015] Revised version (better)
[Mnih et al., 2013] Original version
[van Hasselt et al., 2015] Double Q-learning
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 34 / 30
35. References
References I
Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel,
L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., and Zieba, K. (2016).
End to end learning for self-driving cars.
Goodfellow, I., Bengio, Y., and Courville, A. (2016).
Deep Learning.
MIT Press.
http://www.deeplearningbook.org.
Leon, F. and Gavrilescu, M. (2019).
A review of tracking, prediction and decision making methods for autonomous
driving.
Li, D., Zhao, D., Zhang, Q., and Chen, Y. (2018).
Reinforcement learning and deep learning based lateral control for autonomous
driving.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and
Riedmiller, M. (2013).
Playing atari with deep reinforcement learning.
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 35 / 30
36. References
References II
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,
Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C.,
Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and
Hassabis, D. (2015).
Human-level control through deep reinforcement learning.
Nature, 518(7540):529ā533.
Pan, X., You, Y., Wang, Z., and Lu, C. (2017).
Virtual to real reinforcement learning for autonomous driving.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2015).
You only look once: Uniļ¬ed, real-time object detection.
Sallab, A. E., Abdou, M., Perot, E., and Yogamani, S. (2017).
Deep reinforcement learning framework for autonomous driving.
Electronic Imaging, 2017(19):70ā76.
Sutton, R. S. and Barto, A. G. (2018).
Reinforcement learning : an introduction.
The MIT Press.
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 36 / 30
37. References
References III
Talpaert, V., Sobh, I., Kiran, B. R., Mannion, P., Yogamani, S., El-Sallab, A., and
Perez, P. (2019).
Exploring applications of deep reinforcement learning for real-world autonomous
driving systems.
van Hasselt, H., Guez, A., and Silver, D. (2015).
Deep reinforcement learning with double q-learning.
Yu, A., Palefsky-Smith, R., and Bedi, R. (2016).
Deep reinforcement learning for simulated autonomous vehicle control.
Course Project Reports: Winter, pages 1ā7.
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 37 / 30
38. Autonomous Driving and Reinforcement Learning ā an Introduction
Michael Bosello
michael.bosello@studio.unibo.it
Universit`a di Bologna ā Department of Computer Science and Engineering, Cesena, Italy
Smart Vehicular Systems A.Y. 2019/2020
Michael Bosello Autonomous Driving and RL ā an introduction SVS 2020 38 / 30