This document provides an introduction to reinforcement learning. It discusses how reinforcement learning aims to learn behaviors through trial-and-error interaction with an environment to maximize rewards. The document outlines the basic components of a reinforcement learning problem including states, actions, rewards, and policies. It provides examples of reinforcement learning problems like pole balancing and the mountain car problem to illustrate these concepts. The next class will cover how to learn policies to solve reinforcement learning problems.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
How libraries can support authors with open access requirements for UKRI fund...
Lecture21
1. Introduction to Machine
Learning
Lecture 21
Reinforcement Learning
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
2. Recap of Lectures 5-18
Supervised learning
p g
Data classification
Labeled data
Build a model that
covers all the space
Unsupervised learning
Clustering
Unlabeled data
Group similar objects
G i il bj t
Association rule analysis
Unlabeled data
Get the most frequent/important associations
Genetic Fuzzy Systems
Slide 2
Artificial Intelligence Machine Learning
3. Today’s Agenda
Introduction
Reinforcement Learning
Some examples before going farther
Slide 3
Artificial Intelligence Machine Learning
4. Introduction
What does reinforcement learning aim at?
g
Learning from interaction (with environment)
Goal-directed learning
GOAL
State
Environment
Environment
Action
Agent
agent
Learning what to do and its effect
Trial-and-error search and delayed reward
Slide 4
Artificial Intelligence Machine Learning
5. Introduction
Learn a reactive behaviors
Behaviors as a mapping between perceptions and actions
The
Th agent has to exploit what it already knows in order to
th t l it h t l dk i dt
obtain reward, but it also has to explore in order to make
better action selections in the future.
Dilemma − neither exploitation nor exploration can be
e a e t e e p o tat o o e p o at o ca
pursued exclusively without failing at the task.
Slide 5
Artificial Intelligence Machine Learning
6. How Can We Learn It?
Look-up tables
p Rules
1. 3.
Perception Action
State 1 Action 1
State 2 Action 2
State 3 Action 3
… …
Neural Net orks
Ne ral Networks Finite t
Fi it automata
t
2. 4.
Slide 6
Artificial Intelligence Machine Learning
8. Reinforcement Learning
Reward function
Agent
r:S → R
State Action
or
Reward
st at
r:S×A→ R
rt
Environment
Agent and environment interact at discrete time steps t=0,1,2, …
The agent
g
observes state at step t: st ε S
produces action at at step t: at ε A(st)
gets resulting reward: rt+1 ε R
goes to the next step st+1
Slide 8
Artificial Intelligence Machine Learning
9. Reinforcement Learning
Agent
State Action
Reward
st at
rt
Environment
Trace of a trial
…r …
at rt+1 at+1 rt+2 at+2 rt+3 at+3
t
st st+1 st+2 st+3
Agent goal:
Maximize the total amount of reward t receives
Therefore, that means maximizing not only the immediate reward,
but cumulative reward in the long run
Slide 9
Artificial Intelligence Machine Learning
10. Example of RL
Example: Recycling robot
State
charge level of battery
Actions
look for cans, wait for can, go recharge
Reward
R d
positive for finding cans, negative for running out of battery
Slide 10
Artificial Intelligence Machine Learning
11. More precisely…
Restricting to Markovian Decision Process (MDP)
g ( )
Finite set of situations
Finite t f ti
Fi it set of actions
Transition probabilities
Reward probabilities
This means that
The agent needs to have complete information of the world
State st+1 only depends on state st and action at
Slide 11
Artificial Intelligence Machine Learning
12. Recycling Robot Example
1 − β , −3 β , R search
wait
1, R
wait search
recharge
1, 0
High
g Low
search wait
α ,R 1 − α ,R
search wait
search
1R
1,
Slide 12
Artificial Intelligence Machine Learning
13. Recycling Robot Example
S = {high, low}
g
A (high) = {wait, search}
A (low ) = {wait, search, recharge}
R search : expected # cans while searching
R wait : expected # cans while waiting
R search > R wait
Slide 13
Artificial Intelligence Machine Learning
14. Breaking the Markovian Property
Possible problems that do not satisfy MDP
p y
When action and states are not finite
Solution: Discretize the set of actions and states
When transition probabilities do not depend only on the current
state
Possible solution: represent states as structures build up
over time from sequences of sensations
q
This is POMDP Partial observable MDP
Use POMDP algorithms to solve these problems
g
Slide 14
Artificial Intelligence Machine Learning
16. Elements of RL
Policy: what to do
Reward: what’s good
Value: What’s good because it p ed cts reward
a ue at s t predicts e a d
Model: What follows what
Slide 16
Artificial Intelligence Machine Learning
17. Components of an RL Agent
Policy (behavior)
Mapping from states to actions
π*: S A
Reward
Local reward in state t:
rt
Model
Probability of transition from state s to s’ by executing action a
s
T(s,a,s’)
And
The transitions probabilities depend only on these parameters
This is not known by the agent
Slide 17
Artificial Intelligence Machine Learning
18. Components of an RL Agent
Value functions
Vπ(s): Long-term reward estimation from state s following policy
π
Qπ(s,a): Long-term reward estimation from state s executing
ac o
action a and then following po cy π
ad e oo g policy
A simple example
A maze
Note t at t e age t does not know its o
ote that the agent ot o ts own pos t o It ca o y
position. t can only
perceive what it has in the surrounding states
Slide 18
Artificial Intelligence Machine Learning
19. Components of an RL Agent
Value functions
Vπ(s): Long-term reward estimation from state s following policy
π
Qπ(s,a): Long-term reward estimation from state s executing
ac o
action a and then following po cy π
ad e oo g policy
A simple example
A maze
Note t at t e age t does not know its o
ote that the agent ot o ts own pos t o It ca o y
position. t can only
perceive what it has in the surrounding states
Slide 19
Artificial Intelligence Machine Learning
20. Pursuing the goal: Maximize long term reward
Slide 20
Artificial Intelligence Machine Learning
21. Goals and Rewards
Ok, but I need to maximize my long term reward. How I
, y g
get the long term reward?
Long term reward defined in terms of the goal of the agent
The agent receives the local reward at each time step
How?
Intuitive idea: Sum all the rewards obtained so far
Problem: It can increase heavily in non-ending tasks
Slide 21
Artificial Intelligence Machine Learning
22. Goals and Rewards
How can we deal with non-ending tasks?
g
Weighted addition of local rewards
The γ parameter (0 < γ < 1) is the discounting factor
e pa a ete ) s t e d scou t g acto
…r …
at rt+1 at+1 rt+2 at+2 rt+3 at+3
t
st st+1 st+2 st+3
Note t e b as for immediate rewards
ote the bias o ed ate e a ds
If you want to avoid it, set γ close to 1
Slide 22
Artificial Intelligence Machine Learning
23. Some examples
Slide 23
Artificial Intelligence Machine Learning
24. Pole balancing
Balance the pole
p
The car can move forward
a d backward
and bac a d
Avoid failure:
the pole falling beyond
a certain critical angle
the car hitting the end of the track
g
Reward
-1 upon failure
-ak, for k steps before failure
a
Slide 24
Artificial Intelligence Machine Learning
25. Mountain Car Problem
Objective
j
Get to the top of the hill as
qu c y
quickly as poss b e
possible
State d fi iti
St t definition:
Car position and speed
Actions
Forward, reverse, none
Reward
-1 for each step that are not the on the top of the hill
-number of steps before reaching the top of the hill
Slide 25
Artificial Intelligence Machine Learning
26. Next Class
How t l
H to learn th policies
the li i
Slide 26
Artificial Intelligence Machine Learning
27. Introduction to Machine
Learning
Lecture 21
Reinforcement Learning
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull