SlideShare a Scribd company logo
1 of 32
Download to read offline
• PS: This file is for reference only. Do not
depend solely on it for the content. It is to
supplement your Text book content. It is
recommended to go through suggested
readings/Text book to have detailed
knowledge of the content.
1
1. Introduction
2
Definition
• In 1959, Arthur Samuel, a pioneer in the field
of machine learning (ML) defined it as the
“field of study that gives computers the
ability to learn without being explicitly
programmed”
3
Definition
“A computer program is said to learn from experience
with respect to some class of tasks and performance
measure, if the performance at the tasks, as measured by
the performance measure, improves with experience”
Features of a well-defined learning problem:
• The learning task
• The measure of performance
• The task experience
• Types of learning tasks
5
What is the Learning Problem?
• Learning = Improving with experience at some
task
• Improve over task T ,
• with respect to performance measure P ,
• based on experience E.
6
What is the Learning Problem?
• E.g., Learn to play checkers
T : Play checkers
P : % of games won in world tournament
E: opportunity to play against self
•
7
Learning to Play Checkers
• E.g., Learn to play checkers
T : Play checkers
P : % of games won in world tournament
• What Experience
• What exactly should be learned?
• How shall it be represented?
• What specific algorithm to learn it?
8
Designing a Learning System
• Consider designing a program to learn to play
checkers, with the goal of entering it in the world
checkers tournament
9
Designing a Learning System
• Consider designing a program to learn to play
checkers, with the goal of entering it in the world
checkers tournament
• Performance measure: the percentage of games it
wins in this tournament.
• Requires the following sets
– Choosing Training Experience
– Choosing the Target Function
– Choosing the Representation of the Target Function
– Choosing the Function Approximation Algorithm
10
Choosing the Training Experience
1. What training experience should the system have?
– A design choice with great impact on the outcome.
2. What amount of interaction should there be
between the system and the supervisor?
3. Which training examples?
11
Choosing the Training Experience
1. What training experience should the system have?
– A design choice with great impact on the outcome.
• Will the training experience provide direct or indirect
feedback?
– Direct Feedback: system learns from examples of individual checkers
board states and the correct move for each
Just a bunch of board states together with a correct move.
12
Choosing the Training Experience
• Direct
13
Choosing the Training Experience
1. What training experience should the system have?
– A design choice with great impact on the outcome.
• Will the training experience provide direct or indirect
feedback?
– Direct Feedback: system learns from examples of individual checkers
board states and the correct move for each
Just a bunch of board states together with a correct move.
– Indirect Feedback: A bunch of recorded games, where the correctness
of the moves is inferred by the result of the game.
• Credit assignment problem: Value of early states must be inferred from
the outcome
14
Direct feedback easier to learn from
Choosing the Training Experience
2. What amount of interaction should there be between the
system and the supervisor?
– Choice #1: No freedom. Supervisor provides all training
examples.
– Choice #2: Semi-free. Supervisor provides training
examples, system constructs its own examples too, and
asks questions to the supervisor in cases of doubt.
– Choice #3: Total-freedom. System learns to play
completely unsupervised
• How “daring” the system should be in exploring new boards?
15
Choosing the Training Experience
3. Which training examples?
– There is an huge huge number of possible games.
– No time to try all possible games.
– System should learn with examples that it will
encounter in the future.
– For example, if the goal is to beat humans, it
should be able to do well in situations that
humans encounter when they play (this is hard to
achieve in practice).
16
Choosing the Training Experience
– If training the checkers program consists only of
experiences played against itself, it may never encounter
crucial board states that are likely to be played by the
human checkers champion
– Most theory of machine learning rests on the assumption
that the distribution of training examples is identical to the
distribution of test examples
17
Partial Design of Checkers Learning
Program
• A checkers learning problem:
– Task T: playing checkers
– Performance measure P: percent of games won in the
world tournament
– Training experience E: games played against itself
• Remaining choices
– The exact type of knowledge to be learned
– A representation for this target knowledge
– A learning mechanism
18
Choosing the Target Function
What should be learned exactly?
• The computer program knows the legal moves.
Should learn how to choose the best move. Program
needs to learn the best move from among legal moves
• The computer should learn a ‘hidden’ function.
– target function: ChooseMove : B → M
– B legal Board state, M – legal Move
• ChooseMove is difficult to learn given indirect training
19
Choosing the Target Function
• What should be learned exactly?
20
Choosing the Target Function
• So, our Alternative target function
– An evaluation function that assigns a numerical score to any given
board state
– V : B → ( where is the set of real numbers)
• V(b) for an arbitrary board state b in B
– if b is a final board state that is won, then V(b) = 100
– if b is a final board state that is lost, then V(b) = -100
– if b is a final board state that is drawn, then V(b) = 0
– if b is not a final state, then V(b) = V(b '), where b' is the
best final board state that can be achieved starting from b
and playing optimally until the end of the game
21
 
Choosing the Target Function
• V(b) gives a recursive definition for board state b
– Not usable because not efficient to compute except is first
three trivial cases
– nonoperational definition
• Goal of learning is to discover an operational
description of V
• Learning the target function is often called function
approximation
– Referred to as
22
V̂
Choosing a Representation for the Target
Function
• Choice of representations involve trade offs
– Pick a very expressive representation to allow close approximation to
the ideal target function V
– More expressive, more training data required to choose among
alternative hypotheses
• Use linear combination of the following board features:
– x1: the number of black pieces on the board
– x2: the number of red pieces on the board
– x3: the number of black kings on the board
– x4: the number of red kings on the board
– x5: the number of black pieces threatened by red (i.e. which can be
captured on red's next turn)
– x6: the number of red pieces threatened by black
23
6
6
5
5
4
4
3
3
2
2
1
1
0
)
(
ˆ x
w
x
w
x
w
x
w
x
w
x
w
w
b
V 






24
Partial Design of Checkers Learning
Program
• A checkers learning problem:
– Task T: playing checkers
– Performance measure P: percent of games won in the
world tournament
– Training experience E: games played against itself
– Target Function: V: Board →
– Target function representation
25
6
6
5
5
4
4
3
3
2
2
1
1
0
)
(
ˆ x
w
x
w
x
w
x
w
x
w
x
w
w
b
V 







Choosing a Function Approximation
Algorithm
• To learn we require a set of training
examples describing the board b and the
training value Vtrain(b)
– Ordered pair
26
V̂
 
b
V
b train
,
100
,
0
,
0
,
0
,
1
,
0
,
3 6
5
4
3
2
1 





 x
x
x
x
x
x
x1: the number of black pieces on the board
x2: the number of red pieces on the board
x3: the number of black kings on the board
x4: the number of red kings on the board
x5: the number of black pieces threatened by red (i.e. which can be
captured on red's next turn)
x6: the number of red pieces threatened by black
Choosing a Function Approximation
Algorithm
• Need a procedure that first derives such training
examples from the indirect training experience, then
adjust the weights Wi to best fits these training
examples.
27
Estimating Training Values
• Need to assign specific scores to intermediate
board states
• Approximate intermediate board state b using
the learner's current approximation of the
next board state following b
– Simple and successful approach
– More accurate for states closer to end states
28
))
(
(
ˆ
)
( b
Successor
V
b
Vtrain 
Adjusting the Weights
• Choose the weights wi to best fit the set of training examples
• Minimize the squared error E between the train values and
the values predicted by the hypothesis
• Require an algorithm that
– will incrementally refine weights as new training examples become
available
– will be robust to errors in these estimated training values
• Least Mean Squares (LMS) is one such algorithm
29
   
 
 




examples
training
b
V
b
train
train
b
V
b
V
E
,
2
ˆ
LMS Weight Update Rule
• For each train example
– Use the current weights to calculate
– For each weight wi, update it as
– where
• is a small constant (e.g. 0.1)
30
 
b
V
b train
,
 
b
V
ˆ

   
  i
train
i
i x
b
V
b
V
w
w ˆ


 
Summary of Design Choices
Suggested Readings
• “Machine Learning” by Tom Mitchell, McGraw
Hill Publisher, Chapter 1
32

More Related Content

Similar to Module 1.pdf

introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learningbutest
 
introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learningbutest
 
vorl1.ppt
vorl1.pptvorl1.ppt
vorl1.pptbutest
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_ASrimatre K
 
ML PPT print.pdf
ML PPT print.pdfML PPT print.pdf
ML PPT print.pdfalaparthi
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)butest
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahulKirtoniya
 
Machine Learning 1 - Introduction
Machine Learning 1 - IntroductionMachine Learning 1 - Introduction
Machine Learning 1 - Introductionbutest
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2potaters
 

Similar to Module 1.pdf (20)

introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learning
 
introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learning
 
ML_Lecture_1.ppt
ML_Lecture_1.pptML_Lecture_1.ppt
ML_Lecture_1.ppt
 
ML Unit 1 CS.ppt
ML Unit 1 CS.pptML Unit 1 CS.ppt
ML Unit 1 CS.ppt
 
vorl1.ppt
vorl1.pptvorl1.ppt
vorl1.ppt
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
 
ML PPT print.pdf
ML PPT print.pdfML PPT print.pdf
ML PPT print.pdf
 
Presentation1
Presentation1Presentation1
Presentation1
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)
 
ML
MLML
ML
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
module_1_ppt.pdf
module_1_ppt.pdfmodule_1_ppt.pdf
module_1_ppt.pdf
 
Machine Learning 1 - Introduction
Machine Learning 1 - IntroductionMachine Learning 1 - Introduction
Machine Learning 1 - Introduction
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
Machine learning for computer vision part 2
Machine learning for computer vision part 2Machine learning for computer vision part 2
Machine learning for computer vision part 2
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 

Recently uploaded

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Recently uploaded (20)

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Module 1.pdf

  • 1. • PS: This file is for reference only. Do not depend solely on it for the content. It is to supplement your Text book content. It is recommended to go through suggested readings/Text book to have detailed knowledge of the content. 1
  • 3. Definition • In 1959, Arthur Samuel, a pioneer in the field of machine learning (ML) defined it as the “field of study that gives computers the ability to learn without being explicitly programmed” 3
  • 4. Definition “A computer program is said to learn from experience with respect to some class of tasks and performance measure, if the performance at the tasks, as measured by the performance measure, improves with experience” Features of a well-defined learning problem: • The learning task • The measure of performance • The task experience • Types of learning tasks
  • 5. 5
  • 6. What is the Learning Problem? • Learning = Improving with experience at some task • Improve over task T , • with respect to performance measure P , • based on experience E. 6
  • 7. What is the Learning Problem? • E.g., Learn to play checkers T : Play checkers P : % of games won in world tournament E: opportunity to play against self • 7
  • 8. Learning to Play Checkers • E.g., Learn to play checkers T : Play checkers P : % of games won in world tournament • What Experience • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it? 8
  • 9. Designing a Learning System • Consider designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament 9
  • 10. Designing a Learning System • Consider designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament • Performance measure: the percentage of games it wins in this tournament. • Requires the following sets – Choosing Training Experience – Choosing the Target Function – Choosing the Representation of the Target Function – Choosing the Function Approximation Algorithm 10
  • 11. Choosing the Training Experience 1. What training experience should the system have? – A design choice with great impact on the outcome. 2. What amount of interaction should there be between the system and the supervisor? 3. Which training examples? 11
  • 12. Choosing the Training Experience 1. What training experience should the system have? – A design choice with great impact on the outcome. • Will the training experience provide direct or indirect feedback? – Direct Feedback: system learns from examples of individual checkers board states and the correct move for each Just a bunch of board states together with a correct move. 12
  • 13. Choosing the Training Experience • Direct 13
  • 14. Choosing the Training Experience 1. What training experience should the system have? – A design choice with great impact on the outcome. • Will the training experience provide direct or indirect feedback? – Direct Feedback: system learns from examples of individual checkers board states and the correct move for each Just a bunch of board states together with a correct move. – Indirect Feedback: A bunch of recorded games, where the correctness of the moves is inferred by the result of the game. • Credit assignment problem: Value of early states must be inferred from the outcome 14 Direct feedback easier to learn from
  • 15. Choosing the Training Experience 2. What amount of interaction should there be between the system and the supervisor? – Choice #1: No freedom. Supervisor provides all training examples. – Choice #2: Semi-free. Supervisor provides training examples, system constructs its own examples too, and asks questions to the supervisor in cases of doubt. – Choice #3: Total-freedom. System learns to play completely unsupervised • How “daring” the system should be in exploring new boards? 15
  • 16. Choosing the Training Experience 3. Which training examples? – There is an huge huge number of possible games. – No time to try all possible games. – System should learn with examples that it will encounter in the future. – For example, if the goal is to beat humans, it should be able to do well in situations that humans encounter when they play (this is hard to achieve in practice). 16
  • 17. Choosing the Training Experience – If training the checkers program consists only of experiences played against itself, it may never encounter crucial board states that are likely to be played by the human checkers champion – Most theory of machine learning rests on the assumption that the distribution of training examples is identical to the distribution of test examples 17
  • 18. Partial Design of Checkers Learning Program • A checkers learning problem: – Task T: playing checkers – Performance measure P: percent of games won in the world tournament – Training experience E: games played against itself • Remaining choices – The exact type of knowledge to be learned – A representation for this target knowledge – A learning mechanism 18
  • 19. Choosing the Target Function What should be learned exactly? • The computer program knows the legal moves. Should learn how to choose the best move. Program needs to learn the best move from among legal moves • The computer should learn a ‘hidden’ function. – target function: ChooseMove : B → M – B legal Board state, M – legal Move • ChooseMove is difficult to learn given indirect training 19
  • 20. Choosing the Target Function • What should be learned exactly? 20
  • 21. Choosing the Target Function • So, our Alternative target function – An evaluation function that assigns a numerical score to any given board state – V : B → ( where is the set of real numbers) • V(b) for an arbitrary board state b in B – if b is a final board state that is won, then V(b) = 100 – if b is a final board state that is lost, then V(b) = -100 – if b is a final board state that is drawn, then V(b) = 0 – if b is not a final state, then V(b) = V(b '), where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game 21  
  • 22. Choosing the Target Function • V(b) gives a recursive definition for board state b – Not usable because not efficient to compute except is first three trivial cases – nonoperational definition • Goal of learning is to discover an operational description of V • Learning the target function is often called function approximation – Referred to as 22 V̂
  • 23. Choosing a Representation for the Target Function • Choice of representations involve trade offs – Pick a very expressive representation to allow close approximation to the ideal target function V – More expressive, more training data required to choose among alternative hypotheses • Use linear combination of the following board features: – x1: the number of black pieces on the board – x2: the number of red pieces on the board – x3: the number of black kings on the board – x4: the number of red kings on the board – x5: the number of black pieces threatened by red (i.e. which can be captured on red's next turn) – x6: the number of red pieces threatened by black 23 6 6 5 5 4 4 3 3 2 2 1 1 0 ) ( ˆ x w x w x w x w x w x w w b V       
  • 24. 24
  • 25. Partial Design of Checkers Learning Program • A checkers learning problem: – Task T: playing checkers – Performance measure P: percent of games won in the world tournament – Training experience E: games played against itself – Target Function: V: Board → – Target function representation 25 6 6 5 5 4 4 3 3 2 2 1 1 0 ) ( ˆ x w x w x w x w x w x w w b V        
  • 26. Choosing a Function Approximation Algorithm • To learn we require a set of training examples describing the board b and the training value Vtrain(b) – Ordered pair 26 V̂   b V b train , 100 , 0 , 0 , 0 , 1 , 0 , 3 6 5 4 3 2 1        x x x x x x x1: the number of black pieces on the board x2: the number of red pieces on the board x3: the number of black kings on the board x4: the number of red kings on the board x5: the number of black pieces threatened by red (i.e. which can be captured on red's next turn) x6: the number of red pieces threatened by black
  • 27. Choosing a Function Approximation Algorithm • Need a procedure that first derives such training examples from the indirect training experience, then adjust the weights Wi to best fits these training examples. 27
  • 28. Estimating Training Values • Need to assign specific scores to intermediate board states • Approximate intermediate board state b using the learner's current approximation of the next board state following b – Simple and successful approach – More accurate for states closer to end states 28 )) ( ( ˆ ) ( b Successor V b Vtrain 
  • 29. Adjusting the Weights • Choose the weights wi to best fit the set of training examples • Minimize the squared error E between the train values and the values predicted by the hypothesis • Require an algorithm that – will incrementally refine weights as new training examples become available – will be robust to errors in these estimated training values • Least Mean Squares (LMS) is one such algorithm 29             examples training b V b train train b V b V E , 2 ˆ
  • 30. LMS Weight Update Rule • For each train example – Use the current weights to calculate – For each weight wi, update it as – where • is a small constant (e.g. 0.1) 30   b V b train ,   b V ˆ        i train i i x b V b V w w ˆ    
  • 31. Summary of Design Choices
  • 32. Suggested Readings • “Machine Learning” by Tom Mitchell, McGraw Hill Publisher, Chapter 1 32