Programming by
Demonstration using
Version Space Algebra
Tessa Lau, Steven A. Wolfman, Pedro Domingos,
Daniel S. Weld
Outline
1. Motivation and Introduction
a. Programming by demonstration
b. Version space algebra
2. Version Space Algebra
a. Version Space
b. Operators
3. Learning
Problem
Apples
Flour
Butter
Milk
Strawberries
1. Apples
2. Flour
3. Butter
4. Milk
5. Strawberries
Pseudocode:
Insert the row number followed by the string .s
Move the cursor to the beginning of the next line
After:Before:
Goal: learns a program to automate this numbering task, and more,
often from as little as a single training example.
Problem
User faced with tens of thousands of items to number
- Can our user so better than to perform the numbering by hand or to write a
script in an abstract programming language to accomplish the task?
Solution: The user demonstrates what to do by numbering one item in the
shopping list, and our system learns a program.
Challenges
Need to be able to learn from 1 to 2 examples
- Macro recorders need multiple attempts at demonstration before its correct
- Hard to capture the exact keystrokes to get the right keystrokes with the
desired output, e.g. numbering the lines
Need to represent the possible programs efficiently
- Can’t create all possible programs
Solution
Problem: Given a demonstration, in which the user executes the first few iterations of the repetitive task, the system must
infer the correct program
Programming by Demonstration: infer functions (program) that maps the original state of the text to the target state of the
text.
- Alternative to macro-recorders: Instead of recording a literal sequence of keypresses, however, a PBD system
generalizes from the demonstrated actions to a robust program that is more likely to work in different situations
Version Space Algebra:
- Version space: contains the set of all functions in a given language consistent with a set of input-output examples
- All the possible programs
- Version Space Algebra: allows us to compose together simple version spaces, using operators such as union and
join, in order to construct more complex version spaces containing complex functions
Solution: SMARTEdit
SMARTEdit implements the version space algebra framework for learning programs from user demonstration.
A simple repetitive task: deleting the HTML comments, including their contents, from a text file
- The user records what she wants SMARTedit to do.
This is some sample <!-- deleteme -->HTML text from which the comments <!--
including contents -->ought to be deleted before <!--ZZZ-->publication.
This comment deletion task is one example of the types of repetitive tasks for
which SMARTedit saves user effort.
Record Stop Step Try another guess
Solution - Demonstrating the Action
A simple repetitive task: deleting the HTML comments, including their contents, from a text file
- Once the user has completed one iteration of the repetitive task, she clicks the “Stop” button to indicate that she has
completed the iteration. SMARTedit begins learning.
This is some sample HTML text from which the comments <!-- including
contents -->ought to be deleted before <!--ZZZ-->publication.
This comment deletion task is one example of the types of repetitive tasks for
which SMARTedit saves user effort.
Record Stop Step Try another guess
Solution - Learning
A simple repetitive task: deleting the HTML comments, including their contents, from a text file
- Once the user has completed one iteration of the repetitive task, she clicks the red button to indicate that she has
completed the iteration. SMARTedit begins learning.
- Example: the user performed two actions: moving the cursor to the beginning of the HTML comment, and
deleting to the end of the comment.
- State: document contents, selected text, cursor location
- Multiple states:
1. the state after the cursor has been moved, and
2. the state after the text has been deleted
This is some sample HTML text from which the comments <!-- including
contents -->ought to be deleted before <!--ZZZ-->publication.
This comment deletion task is one example of the types of repetitive tasks for
which SMARTedit saves user effort.
Record Stop Step Try another guess
Solution - Learning
A simple repetitive task: deleting the HTML comments, including their contents, from a text file
Multiple states:
1. the state after the cursor has been moved, and
2. the state after the text has been deleted
SMARTedit’s version space contains a number of candidate hypotheses for the first step of the program:
● Move to row 4 and column 21,
● Move forward forty-two characters,
● Move to after the string sample,
● Move to after the string ample,
● Move to before the string <, and
● Move to before the string <!--
This is some sample HTML text from which the comments <!-- including
contents -->ought to be deleted before <!--ZZZ-->publication.
This comment deletion task is one example of the types of repetitive tasks for
which SMARTedit saves user effort.
Record Stop Step Try another guess
Solution - Correcting Guesses
A simple repetitive task: deleting the HTML comments, including their contents, from a text file
- User ask SMARTedit to execute its learned procedure by pressing the “Step” button: display its first
guess:
- Move to after the string ample, with 36% probability.
- (because the word sample happened to precede the first HTML comment).
This is some sample HTML text from which the comments <!-- including
contents -->ought to be deleted before <!--ZZZ-->publication.
This comment deletion task is one example of the types of repetitive tasks for
which SMARTedit saves user effort.
Record Stop Step Try another guess
Solution - Correcting Guesses
This is some sample HTML text from which the comments <!-- including
contents -->ought to be deleted before <!--ZZZ-->publication.
This comment deletion task is one example of the types of repetitive tasks for
which SMARTedit saves user effort.
A simple repetitive task: deleting the HTML comments, including their contents, from a text file
Wrong action → user corrects its prediction by invoking the “Try another guess” button. → SMARTedit to cycle to its next most likely prediction
Invalidate the first hypothesis = re-update the version space:
● Move to row 4 and column 21
● Move forward forty-two characters
● Move to after the string sample
● Move to after the string ample
● Move to before the string <
● Move to before the string <!--
SMARTedit’s second guess is correct (moving the cursor to the beginning of the next HTML comment, with 26% probability)
Record Stop Step Try another guess
Outline
1. Motivation and Introduction
a. Programming by demonstration
b. Version space algebra
2. Version Space Algebra
a. Version Space
b. Operators
3. Learning
Version Space
Hypothesis: a function that takes as input an element of its domain I and produces as output an element
of its range O.
Hypothesis Space: a set of functions with the same domain and range
Version space: VSH,D, consists of only those hypotheses in hypothesis space H that are consistent with
the sequence D of examples.
Atomic version space: defined by a hypothesis space and a sequence of examples.
Composite version space: a composition of atomic or composite version spaces
Example - Location Version Space
Location
WordOffset CharOffset RowCol
⋃
FindFix
Prefix Suffix
⋃
LinearInt Row Column
⋈
AbsRow RelRow
ConstInt LinearInt
AbsCol RelCol
ConstInt LinearInt
⋃ ⋃The Location version space unions together a variety
of different version spaces specifying different types of
cursor positioning, such as string searching or row and
column positioning.
Each node = version space.
Version Space Algebra
Operators
- Union: Used to combine version spaces
- Join/Independent Join: sequence individual actions together to form a
program
- Transform: A version space transform is used to convert the functions in a
Location space to the functions in the parent Move (or Select, etc.) space
Union
Location
WordOffset CharOffset RowCol
⋃
FindFix
Prefix Suffix
⋃
LinearInt Row Column
⋈
AbsRow RelRow
ConstInt LinearInt
AbsCol RelCol
ConstInt LinearInt
⋃ ⋃
The Row space consists of a union of two spaces:
- AbsRow: absolute row positioning
- “on row 5”
- RelRow: relative positioning
- “on the next row”
Join
Join: sequence individual actions together to form a program
Joins provide a powerful way to build complex version spaces by maintaining the
cross product of two simpler version spaces
- Join in such a way the maintains consistency
Consistency: hypotheses match the user examples output
Independent Join: The examples of Version Space 1 and Version Space 2 are
different.
Independent Join
Location
WordOffset CharOffset RowCol
⋃
FindFix
Prefix Suffix
⋃
LinearInt Row Column
⋈
AbsRow RelRow
ConstInt LinearInt
AbsCol RelCol
ConstInt LinearInt
⋃ ⋃
The RowCol version space: an independent join of a
Row version space and a Column version space. All of
these row and column version spaces join together to
allow a user to specify locations
- “on the next row and the 5th column”
- “on the previous row and the first column”.
Transform function
Location function outputs the location on the
same row but five columns to the right of the
current cursor.
Action
Move
Location
...
Location
...
The corresponding Move function would map
from the complete application state to a new
state in which the cursor is positioned five
columns to the right.
Transformed
Outline
1. Motivation and Introduction
a. Programming by demonstration
b. Version space algebra
2. Version Space Algebra
a. Version Space
b. Operators
3. Learning
Learning
Location
⋃
FindFix
Prefix Suffix
⋃
Prefix and Suffix specifies a location in
the text file relative to the next
occurrence of a string
Goal: the user is searching for the next
prefix match of “PBD”.
User positions the cursor after the first
instance of “PBD”
...
I like PBD Design PBD
Learning
Goal: the user is searching for the next prefix match of “PBD”.
User positions the cursor after the first instance of “PBD”
Possible hypotheses:
- the user may have been searching for the prefix “PBD”
- the prefix “BD”
- the prefix “D”
- a superstring of “PBD”
- etc...
I like PBD Design PBD
Learning
Many possible hypotheses.
System maintains
- Least upper bound (LUB): I like PBD
- Greatest lower bound (GLB): D
I like PBD Design PBD
Learning
User positions the cursor after the second instance of “PBD”
System updates the LUB and GLB:
- Least upper bound (LUB): contain the longest common prefix
- Greatest lower bound (GLB): contain a string longer than previous value
and updated based on the strings that were skipped over between the
starting location and the final position
I like PBD Design PBD
Learning
System updates the LUB and GLB:
- Least upper bound (LUB): contain the longest common prefix
Previous LUB: I like PBD
Current Longest Prefix: I like PBD Design PBD
Final LUB: PBD
I like PBD Design PBD
Learning
System updates the LUB and GLB:
- Least upper bound (LUB): PBD
- Greatest lower bound (GLB): contain a string longer than previous value
and updated based on the strings that were skipped over between the
starting location and the final position
Intuition: The user skipped over the occurrence “D”. Invalidate that hypothesis
Previous GLB: D
String skipped over: Design PBD
- D was skipped over. Increase GLB
- BD was not skipped over
Final GLB: BD
I like PBD Design PBD
Learning
System updates the LUB and GLB:
- Least upper bound (LUB): PBD
- Greatest lower bound (GLB): BD
Possible hypotheses:
- the user may have been searching for the prefix “PBD”
- the prefix “BD”
- the prefix “D”
- a superstring of “PBD”
I like PBD Design PBD
Summary
Learns programs from user demonstrations for repetitive tasks
Described the Version Space Algebra Framework
Targets the challenges:
- Minimal examples
- Efficiently representing programs
*********************************************************************************************
Probabilistic Version Space Algebra
- Why?
- Rank hypotheses
- Take into account domain, a priori knowledge.
- String searching is more likely than row and column positioning → Pr(FindFix) >
Pr(RowCol)
- Probabilities are assigned to each hypothesis h in the hypothesis space H.
- All the probabilities of the hypotheses in H sum to 1
- Probabilities are assigned to each version space Vi in the union W
- All the probabilities of the hypotheses in W sum to 1
Probabilistic Version Space Algebra
Ph,V , is defined inductively up from the bottom of the version space hierarchy
depending on the type of V
- Atomic version space V: Ph,V = Pr(h|V )
- V1 is a transform of another version space V2: Ph,V1 =Pf,V2
- V is a union of version spaces Vi with corresponding probabilities wi
- The probability of a hypothesis h is the sum of its weighted probabilities
- V is a join of a finite number of version spaces Vi :

Program by Demonstration using Version Space Algebra

  • 1.
    Programming by Demonstration using VersionSpace Algebra Tessa Lau, Steven A. Wolfman, Pedro Domingos, Daniel S. Weld
  • 2.
    Outline 1. Motivation andIntroduction a. Programming by demonstration b. Version space algebra 2. Version Space Algebra a. Version Space b. Operators 3. Learning
  • 3.
    Problem Apples Flour Butter Milk Strawberries 1. Apples 2. Flour 3.Butter 4. Milk 5. Strawberries Pseudocode: Insert the row number followed by the string .s Move the cursor to the beginning of the next line After:Before: Goal: learns a program to automate this numbering task, and more, often from as little as a single training example.
  • 4.
    Problem User faced withtens of thousands of items to number - Can our user so better than to perform the numbering by hand or to write a script in an abstract programming language to accomplish the task? Solution: The user demonstrates what to do by numbering one item in the shopping list, and our system learns a program.
  • 5.
    Challenges Need to beable to learn from 1 to 2 examples - Macro recorders need multiple attempts at demonstration before its correct - Hard to capture the exact keystrokes to get the right keystrokes with the desired output, e.g. numbering the lines Need to represent the possible programs efficiently - Can’t create all possible programs
  • 6.
    Solution Problem: Given ademonstration, in which the user executes the first few iterations of the repetitive task, the system must infer the correct program Programming by Demonstration: infer functions (program) that maps the original state of the text to the target state of the text. - Alternative to macro-recorders: Instead of recording a literal sequence of keypresses, however, a PBD system generalizes from the demonstrated actions to a robust program that is more likely to work in different situations Version Space Algebra: - Version space: contains the set of all functions in a given language consistent with a set of input-output examples - All the possible programs - Version Space Algebra: allows us to compose together simple version spaces, using operators such as union and join, in order to construct more complex version spaces containing complex functions
  • 7.
    Solution: SMARTEdit SMARTEdit implementsthe version space algebra framework for learning programs from user demonstration. A simple repetitive task: deleting the HTML comments, including their contents, from a text file - The user records what she wants SMARTedit to do. This is some sample <!-- deleteme -->HTML text from which the comments <!-- including contents -->ought to be deleted before <!--ZZZ-->publication. This comment deletion task is one example of the types of repetitive tasks for which SMARTedit saves user effort. Record Stop Step Try another guess
  • 8.
    Solution - Demonstratingthe Action A simple repetitive task: deleting the HTML comments, including their contents, from a text file - Once the user has completed one iteration of the repetitive task, she clicks the “Stop” button to indicate that she has completed the iteration. SMARTedit begins learning. This is some sample HTML text from which the comments <!-- including contents -->ought to be deleted before <!--ZZZ-->publication. This comment deletion task is one example of the types of repetitive tasks for which SMARTedit saves user effort. Record Stop Step Try another guess
  • 9.
    Solution - Learning Asimple repetitive task: deleting the HTML comments, including their contents, from a text file - Once the user has completed one iteration of the repetitive task, she clicks the red button to indicate that she has completed the iteration. SMARTedit begins learning. - Example: the user performed two actions: moving the cursor to the beginning of the HTML comment, and deleting to the end of the comment. - State: document contents, selected text, cursor location - Multiple states: 1. the state after the cursor has been moved, and 2. the state after the text has been deleted This is some sample HTML text from which the comments <!-- including contents -->ought to be deleted before <!--ZZZ-->publication. This comment deletion task is one example of the types of repetitive tasks for which SMARTedit saves user effort. Record Stop Step Try another guess
  • 10.
    Solution - Learning Asimple repetitive task: deleting the HTML comments, including their contents, from a text file Multiple states: 1. the state after the cursor has been moved, and 2. the state after the text has been deleted SMARTedit’s version space contains a number of candidate hypotheses for the first step of the program: ● Move to row 4 and column 21, ● Move forward forty-two characters, ● Move to after the string sample, ● Move to after the string ample, ● Move to before the string <, and ● Move to before the string <!-- This is some sample HTML text from which the comments <!-- including contents -->ought to be deleted before <!--ZZZ-->publication. This comment deletion task is one example of the types of repetitive tasks for which SMARTedit saves user effort. Record Stop Step Try another guess
  • 11.
    Solution - CorrectingGuesses A simple repetitive task: deleting the HTML comments, including their contents, from a text file - User ask SMARTedit to execute its learned procedure by pressing the “Step” button: display its first guess: - Move to after the string ample, with 36% probability. - (because the word sample happened to precede the first HTML comment). This is some sample HTML text from which the comments <!-- including contents -->ought to be deleted before <!--ZZZ-->publication. This comment deletion task is one example of the types of repetitive tasks for which SMARTedit saves user effort. Record Stop Step Try another guess
  • 12.
    Solution - CorrectingGuesses This is some sample HTML text from which the comments <!-- including contents -->ought to be deleted before <!--ZZZ-->publication. This comment deletion task is one example of the types of repetitive tasks for which SMARTedit saves user effort. A simple repetitive task: deleting the HTML comments, including their contents, from a text file Wrong action → user corrects its prediction by invoking the “Try another guess” button. → SMARTedit to cycle to its next most likely prediction Invalidate the first hypothesis = re-update the version space: ● Move to row 4 and column 21 ● Move forward forty-two characters ● Move to after the string sample ● Move to after the string ample ● Move to before the string < ● Move to before the string <!-- SMARTedit’s second guess is correct (moving the cursor to the beginning of the next HTML comment, with 26% probability) Record Stop Step Try another guess
  • 13.
    Outline 1. Motivation andIntroduction a. Programming by demonstration b. Version space algebra 2. Version Space Algebra a. Version Space b. Operators 3. Learning
  • 14.
    Version Space Hypothesis: afunction that takes as input an element of its domain I and produces as output an element of its range O. Hypothesis Space: a set of functions with the same domain and range Version space: VSH,D, consists of only those hypotheses in hypothesis space H that are consistent with the sequence D of examples. Atomic version space: defined by a hypothesis space and a sequence of examples. Composite version space: a composition of atomic or composite version spaces
  • 15.
    Example - LocationVersion Space Location WordOffset CharOffset RowCol ⋃ FindFix Prefix Suffix ⋃ LinearInt Row Column ⋈ AbsRow RelRow ConstInt LinearInt AbsCol RelCol ConstInt LinearInt ⋃ ⋃The Location version space unions together a variety of different version spaces specifying different types of cursor positioning, such as string searching or row and column positioning. Each node = version space.
  • 16.
    Version Space Algebra Operators -Union: Used to combine version spaces - Join/Independent Join: sequence individual actions together to form a program - Transform: A version space transform is used to convert the functions in a Location space to the functions in the parent Move (or Select, etc.) space
  • 17.
    Union Location WordOffset CharOffset RowCol ⋃ FindFix PrefixSuffix ⋃ LinearInt Row Column ⋈ AbsRow RelRow ConstInt LinearInt AbsCol RelCol ConstInt LinearInt ⋃ ⋃ The Row space consists of a union of two spaces: - AbsRow: absolute row positioning - “on row 5” - RelRow: relative positioning - “on the next row”
  • 18.
    Join Join: sequence individualactions together to form a program Joins provide a powerful way to build complex version spaces by maintaining the cross product of two simpler version spaces - Join in such a way the maintains consistency Consistency: hypotheses match the user examples output Independent Join: The examples of Version Space 1 and Version Space 2 are different.
  • 19.
    Independent Join Location WordOffset CharOffsetRowCol ⋃ FindFix Prefix Suffix ⋃ LinearInt Row Column ⋈ AbsRow RelRow ConstInt LinearInt AbsCol RelCol ConstInt LinearInt ⋃ ⋃ The RowCol version space: an independent join of a Row version space and a Column version space. All of these row and column version spaces join together to allow a user to specify locations - “on the next row and the 5th column” - “on the previous row and the first column”.
  • 20.
    Transform function Location functionoutputs the location on the same row but five columns to the right of the current cursor. Action Move Location ... Location ... The corresponding Move function would map from the complete application state to a new state in which the cursor is positioned five columns to the right. Transformed
  • 21.
    Outline 1. Motivation andIntroduction a. Programming by demonstration b. Version space algebra 2. Version Space Algebra a. Version Space b. Operators 3. Learning
  • 22.
    Learning Location ⋃ FindFix Prefix Suffix ⋃ Prefix andSuffix specifies a location in the text file relative to the next occurrence of a string Goal: the user is searching for the next prefix match of “PBD”. User positions the cursor after the first instance of “PBD” ... I like PBD Design PBD
  • 23.
    Learning Goal: the useris searching for the next prefix match of “PBD”. User positions the cursor after the first instance of “PBD” Possible hypotheses: - the user may have been searching for the prefix “PBD” - the prefix “BD” - the prefix “D” - a superstring of “PBD” - etc... I like PBD Design PBD
  • 24.
    Learning Many possible hypotheses. Systemmaintains - Least upper bound (LUB): I like PBD - Greatest lower bound (GLB): D I like PBD Design PBD
  • 25.
    Learning User positions thecursor after the second instance of “PBD” System updates the LUB and GLB: - Least upper bound (LUB): contain the longest common prefix - Greatest lower bound (GLB): contain a string longer than previous value and updated based on the strings that were skipped over between the starting location and the final position I like PBD Design PBD
  • 26.
    Learning System updates theLUB and GLB: - Least upper bound (LUB): contain the longest common prefix Previous LUB: I like PBD Current Longest Prefix: I like PBD Design PBD Final LUB: PBD I like PBD Design PBD
  • 27.
    Learning System updates theLUB and GLB: - Least upper bound (LUB): PBD - Greatest lower bound (GLB): contain a string longer than previous value and updated based on the strings that were skipped over between the starting location and the final position Intuition: The user skipped over the occurrence “D”. Invalidate that hypothesis Previous GLB: D String skipped over: Design PBD - D was skipped over. Increase GLB - BD was not skipped over Final GLB: BD I like PBD Design PBD
  • 28.
    Learning System updates theLUB and GLB: - Least upper bound (LUB): PBD - Greatest lower bound (GLB): BD Possible hypotheses: - the user may have been searching for the prefix “PBD” - the prefix “BD” - the prefix “D” - a superstring of “PBD” I like PBD Design PBD
  • 29.
    Summary Learns programs fromuser demonstrations for repetitive tasks Described the Version Space Algebra Framework Targets the challenges: - Minimal examples - Efficiently representing programs
  • 30.
  • 31.
    Probabilistic Version SpaceAlgebra - Why? - Rank hypotheses - Take into account domain, a priori knowledge. - String searching is more likely than row and column positioning → Pr(FindFix) > Pr(RowCol) - Probabilities are assigned to each hypothesis h in the hypothesis space H. - All the probabilities of the hypotheses in H sum to 1 - Probabilities are assigned to each version space Vi in the union W - All the probabilities of the hypotheses in W sum to 1
  • 32.
    Probabilistic Version SpaceAlgebra Ph,V , is defined inductively up from the bottom of the version space hierarchy depending on the type of V - Atomic version space V: Ph,V = Pr(h|V ) - V1 is a transform of another version space V2: Ph,V1 =Pf,V2 - V is a union of version spaces Vi with corresponding probabilities wi - The probability of a hypothesis h is the sum of its weighted probabilities - V is a join of a finite number of version spaces Vi :