Background on Lydia Chilton's crowd algorithms work, Maneesh Agrawala's algorithms for generating good designs from design principles, and past and future work on crowdsourcing humor
2. Outline
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools:
– The Onion’s American Voices
– Descriptive Models of Humor
– Breaking Down Humor into Microtasks
– Reducing the Humor Search Space
– Crowdsourcing Humor
4. Simple Crowd Task:
Label this image
Complex Crowd Algorithm:
Decipher this message
Greg Little, Lydia B. Chilton, Max
Goldman, Robert C. Miller. TurKit: Human
Computation Algorithms on Mechanical
Turk. UIST 2010.
Crowdsourcing: Decompose Problems so
People Can Collaboratively Solve them
5. Decomposing and Automate Design:
Model Design as a Search Problem
To automate well-designed visual
communication:
1. Collect design principles from
experts
2. Define a search space
3. Define Constraints
4. Define an objective function to
maximize
Maneesh Agrawala, et al. Design
Principles for Visual Communication.
CACM 2011.
6. Humor is a Difficult AI Problem
Model Humor Generation as Constraint-based Search
Descriptive Theories of Humor:
Constraint:
“Violate an expectation”
7. Outline
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools:
– The Onion’s American Voices
– Descriptive Models of Humor
– Breaking Down Humor into Microtasks
– Reducing the Humor Search Space
– Crowdsourcing Humor
12. Problems too hard for one person
12
You (misspelled) (several) (words). Please spellcheck your work next time.
I also notice a few grammatical mistakes. Overall your writing style is a bit too
phoney. You do make some good (points), but they get lost amidst the (writing).
(signature)
13. 13
initial
artifact output
Improve-and-Vote Builds on
Contextual Clues from Other Workers
You (?) (?) (?) (work).
You (misspelled) (several) (words).
improve vote
improved
original
You (?) (?) (?) (work).
(?) (?) (?) (?) (?).
15. Cascade
Distributes big problems with complex interdependencies such
as creating taxonomies
15
Lydia B. Chilton, et al.
Cascade: Crowdsourcing Taxonomy
Creation. CHI 2013.
16. Frenzy
Combines the knowledge of many experts to meet a
global constraint such as creating conference sessions
Lydia B. Chilton, et al. Frenzy: collaborative data organization for creating conference
sessions. CHI 2014.
16
Used at CSCW 2013 and CHI 2014
(136 papers, 431 papers)
17. Outline
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools:
– The Onion’s American Voices
– Descriptive Models of Humor
– Breaking Down Humor into Microtasks
– Reducing the Humor Search Space
– Crowdsourcing Humor
19. LineDrive: Using Design Principles to
Automate Design Problems
Methodology
• Expert patterns
• Observations from examples of expert work
• Grounded in psychological principles.
19
20. LineDrive: Using Design Principles to
Automate Design Problems
Methodology
• Expert patterns
• Observations from examples of expert work
• Grounded in psychological principles.
20
22. Outline
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools
– The Onion’s American Voices
– Descriptive Models of Humor
– Breaking Down Humor into Microtasks for Individuals
– Reducing the Humor Search Space
– Crowdsourcing Humor
32. Descriptive Theories of Humor
are Not Generative
“Write something that violates an expectation.”
33. Approach To Creating a
Generative Model of Humor
33
Examples + Theories
Model + System + Study
34. 34
“I can’t believe people would just walk into
an Apple store and start breaking things
like it’s a Best Buy.”
American Voices
Examples
People Bending iPhones At Apple Stores
Friday September 30, 2014
Input
Output
36. Satisfy a Constraint
by searching a space of
associations
36
Headline
Joke
Expectation
Violation
Apple
Broken DisplaysBest Buy
Best Buy
Alternative Insult
Associations
Model
37. Sure, I guess UConn’s course is fine if you couldn’t
get into Yale’s football clinic.
UConn Holding Football 101 Clinic for Female Fans
Association:
Alternative Thing
37
38. Alright, how many ‘Summer Savannah’ Backyard
Garden Lion Pedestals do I have to order to turn
this thing around?
Report: ‘SkyMall’ Magazine May End Print Edition
Association:
Details
38
39. Just tell me what to do and where to go to guarantee
happiness forever.
Poll: Elite Colleges Don’t Produce Happier
Graduates
Association:
Personality Flaw
39
40. Satisfying a Constraint
by searching a space of
associations
40
Headline
Joke
Expectation
Violation
Sarcasm
Unexpected
Angle
Bait-and-
Switch
Apple
Broken DisplaysBest Buy
Best Buy
Alternative Insult
Associations
Model
41. Study: First Born Children More
Ambitious
As a second-born girl, I’d just like
to say wooohoo! Spring break!
Bait-and-Switch
Bait: The study is false.
Switch: The study is true.
41
Expectation Violation #1
42. Unexpected
Angle
“It’s an exciting time to be a
shark, that’s for sure!”
Great White Shark Populations Surging Off East Coast
Expected Focus: This is bad for people
Unexpected Focus: This is good for sharks
42
Expectation Violation #2
43. Sarcasm
“There is absolutely no other
explanation.”
Mick Jagger Blamed for Brazil’s World Cup Defeat
43
Expectation Violation #3
Literal Subtext: This is true
Sarcastic Flip Subtext: This is false
44. How Not to Make a Joke
Static Workflow
Why not?
Does not adapt to the input
The same recipe won’t apply to
every headline.
46. Outline
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools
– The Onion’s American Voices
– Descriptive Models of Humor
– Breaking Down Humor into Microtasks for Individuals
– Reducing the Humor Search Space
– Crowdsourcing Humor
47. HumorTools
Adaptive Workflow
• Apply microtasks to the headline
• Until you can find a way of violating
an expectation
• Follow your train of thought, until
you get stuck, then backtrack
• Not yet crowdsourced
• But it is decomposed into
microtasks
47
Aspects
Violation
Mechanism
Headline
Prototype &
Test
Joke
Expected
Reactions
Associations
48. HumorTools Interface
48
Justin Bieber was baptized this weekend in an attempt to wash
away his sins following a scandal in which the singer appears in
videos uses racial slurs
51. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bad.
His music sucks.
Expected
Reactions
51
52. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bad.
His music sucks.
Expected
Reactions
Violation
Mechanism
52
53. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bad.
His music sucks.
53
54. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR peopleAssociations
Bad.
His music sucks.
54
55. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR peopleAssociations
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Violation
Mechanism
55
56. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR peopleAssociations
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Violation
Mechanism
Prototype
Test
Joke
…
……
…
Never let it be said that Bieber’s PR people
aren’t bringing new ideas to the table. 56
57. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR peopleAssociations
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Violation
Mechanism
Prototype
Test
Joke
…
……
…
Never let it be said that Bieber’s PR people
aren’t bringing new ideas to the table. 57
58. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR people
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Joke
…
……
…
58
59. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR people
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Joke
Expected
Reactions
Bad.
Dirty, Gross
…
……
…
59
60. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR people
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Joke
Expected
Reactions
Bad.
Dirty, Gross
Violation
Mechanism
Unexpected Angle:
A Bieber fan who’d
worship his dirty
bathtub
…
……
…
60
61. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR people
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Joke
Expected
Reactions
Bad.
Dirty, Gross
Violation
Mechanism
Unexpected Angle:
A Bieber fan who’d
worship his dirty
bathtub
Prototype
Test
……
……
……
…
61
62. Justin Bieber Baptized In NYC BathtubHeadline
Joke
Justin Bieber Baptized Bathtub
Headline
Aspects
Bieber’s PR people
Bad.
His music sucks.
Unexpected Angle:
Expected Reaction: Bad
Angle: PR people are clever
Angle Reaction: Good.
Joke
Expected
Reactions
Bad.
Dirty, Gross
Violation
Mechanism
Unexpected Angle:
A Bieber fan who’d
worship his dirty
bathtub
Prototype
Test
……
……
……
…
Oh my God! Can I lick the tub?
62
63. HumorTools Study
• 20 users: Stanford undergraduates
• Humor novices
• 60-80 minutes:
– Tutorial on 20 microtasks
– Humor Writing
• Write jokes for 3 headlines
• Rate jokes against The Onion (21 people)
63
64. HumorTools Study Headlines
64
PETA Seeks Copyright for Primate
A lawsuit filed by PETA claims that “monkey selfies” snapped by a macaque
who stole a photographer’s camera should be considered the legal property of
the macaque himself.
Liquid Water Found on Mars
NASA revealed Monday that they have found evidence of liquid water on the
Mars, pointing to the possibility of life on the red planet.
Can users synthesize humor?
65. HumorTools Evaluation
• 75% of participants were able to synthesize
humor
• 25% of the HumorTools jokes were rated funny
65
66. 66
PETA Seeks Copyright for Primate
This is great news for animal rights.
Now my neighbor can take my dog to court
instead of me the next time he deposits some
of his “legal property” in their front lawn.
This is why you always get the macaque to
sign a release.
67. 67
PETA Seeks Copyright for Primate
This is great news for animal rights.
Now my neighbor can take my dog to court
instead of me the next time he deposits some
of his “legal property” in their front lawn.
This is why you always get the macaque to
sign a release.
HumorTools
47% Funny
62% Funny
68. 68
Liquid Water Found on Mars
I think it’s more convenient for me to
get it from the tap.
In other news, Pluto has been called
even less of a planet now.
Sounds refreshing!
69. 69
Liquid Water Found on Mars
HumorTools
HumorTools
I think it’s more convenient for me to
get it from the tap.
In other news, Pluto has been called
even less of a planet now.
Sounds refreshing!
15% Funny
45% Funny
45% Funny
70. User Comments
70
“[Associations] helped me think of jokes in
a wider conceptual space than I
previously had.” (p9)
Before this, I never really knew where to
start with joke writing I just kind of sat and
thought until I came up with something. (p5)
71. Outline
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools
– The Onion’s American Voices
– Descriptive Models of Humor
– Breaking Down Humor into Microtasks for Individuals
– Reducing the Humor Search Space
– Crowdsourcing Humor
84. Crowdsourcing: Decompose Problems so
People Can Collaboratively Solve them
Simple Crowd Task:
Label this image
Complex Crowd Task:
Decipher this message
Greg Little, Lydia B. Chilton, Max
Goldman, Robert C. Miller. TurKit: Human
Computation Algorithms on Mechanical
Turk. UIST 2010.
85. To Automate Design, People Model
Design as a Search Problem
1. Collect design principles from
experts
2. Define a search space
3. Define Constraints
4. Define an objective function to
maximize
Maneesh Agrawala, et al. Design
Principles for Visual Communication.
CACM 2011.
86. Humor is a Difficult AI Problem
Model Humor Generation as Constraint-Based Search
Descriptive Theories of Humor:
Constraint:
“Violate an expectation”
87. HumorTools: An Adaptive Microtask Workflow
for Crowdsourcing Humor
• My Background: Crowd Algorithms
– TurKit
– Cascade & Frenzy
• Related Work: Automated Design
– LineDrive
• Current Project: HumorTools
– The Onion’s American Voices
– Descriptive Models of Humor
– Decomposing Humor into Microtasks
for Individuals
– Reducing the Humor Search Space
– Crowdsourcing Humor
Editor's Notes
Thanks for inviting me.
I’m a post-doc here at Stanford in the HCI group.
I’ll be starting as Computer science faculty at Columbia in 2017.
I’m going to talk about humor but, I’m also going to give you some background
on my area, which is Crowdsourcing,
and HCI.
Let me quickly summarize everything I’m going to say in the next 90 slides.
The objective of Crowdsourcing is to decompose problems so people can collaboratively solve them.
Crowdsourcing can be very simple tasks like image labeling or more complex tasks like
Deciphering this message where no one person knows the answer.
My work is on creative crowd algorithms that decompose harder and harder problems.
Other people in HCI have decomposed hard problems like design.
In fact, didn’t have to use the crowd, there were able to automate some design problems.
Maneesh Agrawala in the HCI group here has several papers that automatically create well-design Visual Communication:
Generating readable map directions
“how things work diagrams”
Assembly instructions.
All of these problems were decomposed in a similar way:
Humor is a difficult AI problem and very hard to decompose
My work is inspired by Maneesh’s work, and treats humor generation as a constraint-based search.
Crowdsourcing was invented to use people for tasks that computer can’t do like labeling images.
Mechanical Turk is a platform where you can pay people a few cents to do a simple task.
Turk quickly became very popular,
But for a long time, Turk was only used to do embarrassingly parallel tasks like image labeling.
So I asked:
Can we do more complex things?
But my coauthor and I built a system called TurKit.
which introduced the concept of Crowd Algorithms.
You can write JavaScript programs that makes function calls to the crowd.
The most basic adaptive mechanism is improve-and-vote.
Which allows workers to build on the work of others.
For example,
Here is some messy handwriting. No one person could read it.
I can’t even read it!
Using improve and vote, the crowd could decipher it.
Here’s how it works…
It starts with the image,
At first we have no idea what it say, so we start with question marks
Then we ask a worker to improve it, and they get parts of the answer, but other parts are wrong.
Then we vote on which version is better.
Usually the improved version is better, but not always. Sometimes you get spam, or just really weird interpretations.
You take the version workers voted for, and you improve it again, until you get it all deciphered.
This little portion says “You misspelled several words…” and goes on.
TurKit was widely adopted in research.
Systems I built are in red, but other people used TurKit and built crowd-powered systems, too.
Notably, Michael Bernstein at Stanford and
Jeff Bigham at CMU built their first crowd systems in TurKit.
Cascade distributes big problems with complex interdependencies like creating taxonomies
Frenzy combines the knowledge of experts to meet a global constraint like
Grouping papers into conference sessions.
Those patterns are all completed by humans, but if we understand the space well enough,
it is possible to solve problems automatically with design principles.
LineDrive is a system for creating useful driving directions based on visual principles.
But what people who give good directions do is more like this:
How can a computer do that automatically? It’s so stylized? So different from a map.
Methodology for automating the creation of these is to look at expert patterns.
Expert patterns,
Observations from examples of expert work
Grounded in psychological principles.
Design Space: angle you show the turn at. Can exaggerate it to emphasize the orientation of the turn.
All Roads must be visible.
Quantitative Evaluation: minimum road length (visibility threshhold),
You want as many of the important roads as possible to meet this visibility threshhold.
Humor is a highly valued human skill.
It is a sign of creativity and intelligence (Images – Monty Python. Obama laughin?)
Humor is used in many domains:
- Persuasion in advertising and politics
Humor is used in many domains:
- Persuasion in advertising and politics
It is used to capture attention
And it helps us learn.
Because when we are having fun and at ease we learn better than when we are stressed and pressued.
Humor is hard.
Humor is an AI Challenge
Robots can’t do it. Why? Level of understanding experience and cretivity they don’t have.
Data.
Incongruity theories the notion that humor violates our expectations. It is the detection of the incongruity that we find funny.
Aristotle mentions this, modern philosophers such as Kant and Schopenhauer latched on
It is now the dominant theory of humor.
Incongruity theories the notion that humor violates our expectations. It is the detection of the incongruity that we find funny.
Aristotle mentions this, modern philosophers such as Kant and Schopenhauer latched on
It is now the dominant theory of humor.
Humanity has been wondering about humor for over 2,000 years, so I’m obviously not go to start from scratch.
My approach is to mine lots of examples and theories, and build a system that embodies them.
It basically takes descriptive theories and turns them into generative theories.
For examples, I decided to narrow in on one very special form of humor.
It’s made by The Onion, called American Voices.
It takes real news headlines like this one:
“People Bending iPhones At Apple Stores”
And The Onion write fake man-on-the-street style responses:
What’s special about this is that the headline and joke represent input and output pairs that we can study.
But what are we going to going to analyze these jokes for?
Well, I distilled a lot of books on humor by philosophers, linguists, and comedians, and two ideas kept coming up:
Associations
Expectation Violation.
I prototyped a number of different generative models of humor, I settled on this model:
Satisfying a constraint by search a space of associations.
Expectation Violations is a goal. But it’s too hard to hit right off the bat.
So I use associations to search for possible ways to violate expectations.
By looking at examples, I found concrete mechanisms that could be applied. For example,
Apple Store to Best Buy are Alternatives. That’s one type of association
Best Best to Broken Displays is an Insult. That’s a different type of association.
And there are multiple types of expectation violation.
Let me explain each of those.
Like all good writing, details matter. Details bring the reader into the world your are writing.
Like all good writing, details matter. Details bring the reader into the world your are writing.
I prototyped a number of different generative models of humor, I settled on this model:
Satisfying a constraint by search a space of associations.
Expectation Violations is a goal. But it’s too hard to hit right off the bat.
So I use associations to search for possible ways to violate expectations.
By looking at examples, I found concrete mechanisms that could be applied. For example,
Apple Store to Best Buy are Alternatives. That’s one type of association
Best Best to Broken Displays is an Insult. That’s a different type of association.
And there are multiple types of expectation violation.
Let me explain each of those.
In the Best buy joke, it starts my aggressing with your expectation. This is horrible, I can’t believe it.
Then at the last minute, it switches to be a Best Buy insult.
I call this “bait-and-switch”.
For another example of Violation Mechanisms, here’s a different joke.
The headline is HEALDINE
The Onion writes: “So did he think Transcendence was good, or what?”
You expect the focus to be the end for Mankind, and how horrible that would be.
But instead, the focus is on the movie.
Lastly is sarcasm. Which hardly needs to be explained.
I prototyped a number of different generative models of humor, I settled on this model:
Satisfying a constraint by search a space of associations.
Expectation Violations is a goal. But it’s too hard to hit right off the bat.
So I use associations to search for possible ways to violate expectations.
By looking at examples, I found concrete mechanisms that could be applied. For example,
Apple Store to Best Buy are Alternatives. That’s one type of association
Best Best to Broken Displays is an Insult. That’s a different type of association.
And there are multiple types of expectation violation.
Let me explain each of those.
I built a system called HumorTools that teaches people the microtasks for creating humor and helps them follow them.
The general guideline is that you
Apply microtasks…
Find a way of violating an expectation
There are a lot of options, so just follow your train of thought until you get stuck, then backtrack.
Here is what one real user interface looks like:
This asks users to decompose a headline into aspects.
There are 20 of microtasks..
It’s too big to show you all of it.
Instead, let me walk you through an example that illustrates the process.
We start with the headline
Justin Bieber Basptized in NYC Bathtub
We break it into aspects
We pick the aspect “Justin Bieber” and start writing our expected reactions.
I expect this is bad. His music sucks, he’s a bad role model.
But I can’t think of a violation mechanism for that.
So the I forget that chain of reasoning, and backtrack
Let’s try an association.
Bieber has PR people who cover up for his stupid behavior.
They’re the ones working hard to make it seem like he’s repentant.
Ah! And there’s a violation here!
Although we expect the headline to be bad,
From the perspective of the PR people it’s good
“Never let it be said that Bieber’s PR people aren’t brining new ideas to the table.”
And to get there we followed a train of thought, and did a little backtracking.
We can write multiple jokes this way.
If we start with Bathtub, we expect that to be dirty and gross
But to a fan, that they’d worship him anyway.
That’s a different unexpected angle joke.
I ran a study on HumorTools using 20 Stanford undergraduates.
I’m going to read you two headlines from the study.
…
The question is can users synthesize humor for them?
The evaluation showed that
75% of participatns were able
25% of the jokes were rated funny
Let me read show you some jokes:
For the PETA headline where PETA seeks copyright for primate
Here are two jokes for this headline:
(read)
Both have the look and feel of the Onion, and are amusing.
The first one was actually written by The Onion.
The second was synthesized with HumorTools
Raters found the HumorTools joke funnier than The Onion
For the Mars headline where Liquid Water Found on Mars, indicting the strong possibility of life on the red planet.
Here are three jokes for this headline:
(read)
All 3 have the look and feel of the Onion, and are amusing.
The first two are humorTools
And “sounds refreshing” is The Onion
Both HumorTools jokes were funnier than The Onion
HumorTools is currently done completely by people, but since each microtask is fairly small, we could start to
Train Machine learning systems with user-collected data, leading to hybrid human-computer joke generation
I have a grant to fund myself and 6 students this summer to be funnier than The Onion and release HumorTools