SlideShare a Scribd company logo
1 of 35
Download to read offline
proScript:
Partially Ordered Scripts Generation
Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras,

Niket Tandon, Peter Clark, Yejin Choi

What is script? Why is it important?
“a script is a stereotyped sequence of actions that defines a well-known
situation and has associated with it”
Roger Schank and Robert Abelson (1977)
2
What is script? Why is it important?
“a script is a stereotyped sequence of actions that defines a well-known
situation and has associated with it”
Roger Schank and Robert Abelson (1977)
3
What is script? Why is it important?
“a script is a stereotyped sequence of actions that defines a well-known
situation and has associated with it”
Roger Schank and Robert Abelson (1977)
• Part of commonsense knowledge
• Scripts helps to represent and understand causal structure of events.
• Scripts allows inference about implicit cause and effect relationship.
4
Two major approaches for Scripts in NLP
1. Script as narrative chain (Mooney and Dejong 1985, Chambers and Jurafsky, 2008, 2009)
5
Automatically induce scripts from raw texts
An automatically learned “Prosecution” chain.
(Figure from Chambers and Jurafsky, 2008)
Two major approaches for Scripts in NLP
1. Script as narrative chain (Mooney and Dejong 1985, Chambers and Jurafsky, 2008, 2009)
6
Automatically induce scripts from raw texts
An automatically learned “Prosecution” chain.
(Figure from Chambers and Jurafsky, 2008)
[Pros]
•scalability
[Cons]
• news domain (but not everyday scenarios)
• a lot of reporting verbs (non-core events)
• highly abstracted as tuples of verb and the dependency
• evaluation scheme is insufficient
Two major approaches for Scripts in NLP
2. Script as paraphrase sets (Regneri et al., 2010; Modi et al., 2016; Wangzare et al., 2016)
7
1. Ask crowdworkers to write down a sequence of events.
2. The collected sequences are aligned with paraphrased events.
3. Cluster the aligned events.
Multiple sequence alaingment
EATING IN A FAST-FOOD RESTAURANT
(Figure from Regneri et al., 2010)
Two major approaches for Scripts in NLP
2. Script as paraphrase sets (Regneri et al., 2010; Modi et al., 2016; Wangzare et al., 2016)
8
1. Ask crowdworkers to write down a sequence of events.
2. The collected sequences are aligned with paraphrased events.
3. Cluster the aligned events.
Multiple sequence alaingment
EATING IN A FAST-FOOD RESTAURANT
(Figure from Regneri et al., 2010)
[Pros]
•High quality (for everyday scenarios)
[Cons]
• Scalability (< 50)
• No evaluation metric for modeling
Our contributions
Quality Scalability
Script as narrative chain - +
Script as paraphrase sets + -
proScript + +
1. Crowdsourced 6.4k (partially ordered) scripts.
2. With this data, we adapt pre-trained neural LMs to generate high-quality scripts.
3. Proposed two complementary task definitions with proScirpt dataset.
9
Data Collection
Crowdsourcing 11
1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts
Crowdsourcing 12
1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts
ROCStories (Mostafazadeh et al., 2016) → 2,564 scenarios
Manually curate patterns
- want(ed) to ... (e.g., go to Hawaii),
- need(ed) to ... (e.g, get a haircut),
- look(ing) to (e.g, buy a television).
sign into email account, go to a bathroom, buy some new clothes, replace a closet door,
DeScript (Wanzare et al., 2016) → 40 scenarios
take a bath, do laundry, order a pizza, …
VirtualHome (Puig et al., 2018) → 233 scenarios
turn on light, put mail in mail organizer, put dishes away, …
Crowdsourcing 13
1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts
Suppose a scenario where someone wants to
“travel to Hawaii”.
Q1: Describe 5 to 7 essential steps and each time duration. (Note: the order does not matter.)
decide schedule 1 hour
book a flight
go to airport
30 minutes
1 hour
Crowdsourcing 14
1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts
Suppose a scenario where someone wants to
“travel to Hawaii”.
Q1: Describe 5 to 7 essential steps and each time duration. (Note: the order does not matter.)
decide schedule 1 hour
book a flight
go to airport
Q2. Create a flowchart of the steps
(possibly in partial order, where temporal ordering
is required only when it is necessary.)
30 minutes
1 hour
Crowdsourcing 15
1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts
Two different workers are asked to do the Q2.
If both two validator created script graph that have low agreement (F1), it is discarded.
E = Ê =
proScript: Dataset statistics 16
buy some new
clothes (1 hour)
go to bathroom (5 mins)
sign into email
account (1 min)
replace a closet door (1 day)
find a new job
(1 month)
open a small business (1 year)
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
ln m (m=minutes)
0.00
0.05
0.10
0.15
Normalized
Density
Degree>=3
5%
Degree=2
28%
Degree=1
67%
Normalized histogram of time duration Degree of the graphs
Modeling and
Experiments
Two task settings
1. proScript Edge Prediction
18
2. proScript Generation
Two task settings
1. proScript Edge Prediction
19
2. proScript Generation
find the cake recipe
gather the ingredients
turn on the oven
mix the ingredients
put the cake batter in the oven
bake for the right amount of time
take the cake out of the oven
Scenario: bake a cake
Given: Scenario and randomly shuffled events Given: Scenario and the number of events (to generate)
Scenario: bake a cake
Number of events: 7
Two task settings
1. proScript Edge Prediction
20
2. proScript Generation
find the cake recipe
gather the ingredients
turn on the oven
mix the ingredients
put the cake batter in the oven
bake for the right amount of time
take the cake out of the oven
Scenario: bake a cake
Given: Scenario and randomly shuffled events Given: Scenario and the number of events (to generate)
Scenario: bake a cake
Number of events: 7
How to represent DAG?
How to represent a DAG structure? — DOT language. 21
digraph G { A -> B; A -> C; B -> D; C -> D; D -> E; }
=
How to represent a DAG structure? — DOT language. 22
digraph G { A -> B; A -> C; B -> D; C -> D; D -> E; }
digraph G{ Step0: find the cake recipe; Step1: gather the
ingredients; Step2: mix the ingredients; (… omitted …) Step5:
bake for the right amount of time;
Step6: take the cake out of the oven;
Step0 -> Step1; Step0 -> Step3; (… omitted …)
Step5 -> Step6; }
=
=
Two task settings
1. proScript Edge Prediction
23
2. proScript Generation
Two task settings
1. proScript Edge Prediction
24
2. proScript Generation
Models
1. proScript_gen
T5 (11B) finetuning with proScript data (3.2k scenarios)
25
Models
1. proScript_gen
T5 (11B) finetuning with proScript data (3.2k scenarios)
2. proScript_transfer
Pre-finetune with WikiHow data (130k) → finetune with proScript data
26
Model Outputs (examples) by proScript_gen 27
Play the organ Drink a glass of milk Audition for a musical
Model Outputs (examples) by proScript_gen 28
Play the organ Drink a glass of milk Audition for a musical
How to evaluate these?
How to evaluate the generated scripts (DAGs)? 29
Absolute evaluation Relative evaluation
Absolute evaluation: graph edit distance (Abu-Aisheh et al., 2015)
30
Generated DAG Edited DAG
Absolute evaluation: Result (lower GED, the better) 31
proScript_gen
proScript_transfer
Human
0 1.25 2.5 3.75 5
2.33
3.55
3.54
0.46
1.211
1.199
vertex edge
Graph edit distance (random baseline = 11.3)
• Random (11.3) >> proScript_gen = transfer (4.7) > human (2.7)
• Edge-related edits > Vertex-related edits
proScript_gen
proScript_transfer
Edit analysis 32
human error
5%
granularity
32%
order
ambiguity
32%
irrelevant/
redundant event
11%
missing
event
5%
incorrect
order
16%
human error
23%
paraphrase
7%
granularity
27%
order
ambiguity
33%
incorrect
order
10%
30%
10%
Edit Types (Scripts by model) Edit Types (Scripts by human)
• 70% of edits are minor corrections.
• proScript generates more crucial edits than human
Relative evaluation: pairwise comparison 33
proScript_gen vs. Human Gen vs. Transfer
proScript_gen transfer
proScript_gen
Human
VS. VS.
Relative evaluation: pairwise comparison 34
proScript_gen vs. Human Gen vs. Transfer
proScript_gen transfer
proScript_gen
Human
<
=
>
<
=
>
55.3%
22.7%
22.0% 23.8%
45.6%
30.6%
Summary 35
We collect 6.4k partially ordered scripts, proScript,
which is substantially larger than prior datasets.
With proScript, we introduced two complementary tasks
and models. (edge prediction and script generation)
We show the first time that pre-trained neural LM can be
adapted to generate partial-order Scripts.
Data will be available: https://proscript.allenai.org/

More Related Content

Similar to Partially Ordered Script Generation with proScript Dataset

Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDBPatrick Stokes
 
Philosophies of Building the Workplace
Philosophies of Building the WorkplacePhilosophies of Building the Workplace
Philosophies of Building the WorkplaceZsolt Fabok
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsLeonardo Borges
 
Kanban for Software Development and Kaizen Culture
Kanban for Software Development and Kaizen CultureKanban for Software Development and Kaizen Culture
Kanban for Software Development and Kaizen CultureAcquate
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
Mining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsMining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsDirk Fahland
 
Adventures in a Microservice world at REA Group
Adventures in a Microservice world at REA GroupAdventures in a Microservice world at REA Group
Adventures in a Microservice world at REA Groupevanbottcher
 
Develop Maintainable Apps - edUiConf
Develop Maintainable Apps - edUiConfDevelop Maintainable Apps - edUiConf
Develop Maintainable Apps - edUiConfAnnyce Davis
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentThomas Zimmermann
 
Concurrency and Python - PyCon MY 2015
Concurrency and Python - PyCon MY 2015Concurrency and Python - PyCon MY 2015
Concurrency and Python - PyCon MY 2015Boey Pak Cheong
 
Threading Is Not A Model
Threading Is Not A ModelThreading Is Not A Model
Threading Is Not A Modelguest2a5acfb
 
Manage a project portfolio
Manage a project portfolioManage a project portfolio
Manage a project portfolioMichele Orselli
 
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) 마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) Amazon Web Services Korea
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Christian Catalan
 
The Ring programming language version 1.6 book - Part 181 of 189
The Ring programming language version 1.6 book - Part 181 of 189The Ring programming language version 1.6 book - Part 181 of 189
The Ring programming language version 1.6 book - Part 181 of 189Mahmoud Samir Fayed
 
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationDRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationSebastiano Panichella
 

Similar to Partially Ordered Script Generation with proScript Dataset (20)

Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDB
 
Philosophies of Building the Workplace
Philosophies of Building the WorkplacePhilosophies of Building the Workplace
Philosophies of Building the Workplace
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event Systems
 
Kanban for Software Development and Kaizen Culture
Kanban for Software Development and Kaizen CultureKanban for Software Development and Kaizen Culture
Kanban for Software Development and Kaizen Culture
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
Mining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsMining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution Logs
 
Adventures in a Microservice world at REA Group
Adventures in a Microservice world at REA GroupAdventures in a Microservice world at REA Group
Adventures in a Microservice world at REA Group
 
Develop Maintainable Apps - edUiConf
Develop Maintainable Apps - edUiConfDevelop Maintainable Apps - edUiConf
Develop Maintainable Apps - edUiConf
 
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software DevelopmentMining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
 
Concurrency and Python - PyCon MY 2015
Concurrency and Python - PyCon MY 2015Concurrency and Python - PyCon MY 2015
Concurrency and Python - PyCon MY 2015
 
Threading Is Not A Model
Threading Is Not A ModelThreading Is Not A Model
Threading Is Not A Model
 
Eyes or heart
Eyes or heartEyes or heart
Eyes or heart
 
Manage a project portfolio
Manage a project portfolioManage a project portfolio
Manage a project portfolio
 
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) 마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
 
The Ring programming language version 1.6 book - Part 181 of 189
The Ring programming language version 1.6 book - Part 181 of 189The Ring programming language version 1.6 book - Part 181 of 189
The Ring programming language version 1.6 book - Part 181 of 189
 
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationDRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
 

More from Keisuke Sakaguchi (9)

Acl18 sakaguchi
Acl18 sakaguchiAcl18 sakaguchi
Acl18 sakaguchi
 
Ijcnlp17 sakaguchi
Ijcnlp17 sakaguchiIjcnlp17 sakaguchi
Ijcnlp17 sakaguchi
 
ACL17_Sakaguchi
ACL17_SakaguchiACL17_Sakaguchi
ACL17_Sakaguchi
 
TACL16_Sakaguchi
TACL16_SakaguchiTACL16_Sakaguchi
TACL16_Sakaguchi
 
NAACL15_sakaguchi
NAACL15_sakaguchiNAACL15_sakaguchi
NAACL15_sakaguchi
 
BEA12_sakaguchi
BEA12_sakaguchiBEA12_sakaguchi
BEA12_sakaguchi
 
ACL13_sakaguchi
ACL13_sakaguchiACL13_sakaguchi
ACL13_sakaguchi
 
WMT14_sakaguchi
WMT14_sakaguchiWMT14_sakaguchi
WMT14_sakaguchi
 
COLING12_sakaguchi
COLING12_sakaguchiCOLING12_sakaguchi
COLING12_sakaguchi
 

Recently uploaded

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Recently uploaded (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

Partially Ordered Script Generation with proScript Dataset

  • 1. proScript: Partially Ordered Scripts Generation Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras,
 Niket Tandon, Peter Clark, Yejin Choi

  • 2. What is script? Why is it important? “a script is a stereotyped sequence of actions that defines a well-known situation and has associated with it” Roger Schank and Robert Abelson (1977) 2
  • 3. What is script? Why is it important? “a script is a stereotyped sequence of actions that defines a well-known situation and has associated with it” Roger Schank and Robert Abelson (1977) 3
  • 4. What is script? Why is it important? “a script is a stereotyped sequence of actions that defines a well-known situation and has associated with it” Roger Schank and Robert Abelson (1977) • Part of commonsense knowledge • Scripts helps to represent and understand causal structure of events. • Scripts allows inference about implicit cause and effect relationship. 4
  • 5. Two major approaches for Scripts in NLP 1. Script as narrative chain (Mooney and Dejong 1985, Chambers and Jurafsky, 2008, 2009) 5 Automatically induce scripts from raw texts An automatically learned “Prosecution” chain. (Figure from Chambers and Jurafsky, 2008)
  • 6. Two major approaches for Scripts in NLP 1. Script as narrative chain (Mooney and Dejong 1985, Chambers and Jurafsky, 2008, 2009) 6 Automatically induce scripts from raw texts An automatically learned “Prosecution” chain. (Figure from Chambers and Jurafsky, 2008) [Pros] •scalability [Cons] • news domain (but not everyday scenarios) • a lot of reporting verbs (non-core events) • highly abstracted as tuples of verb and the dependency • evaluation scheme is insufficient
  • 7. Two major approaches for Scripts in NLP 2. Script as paraphrase sets (Regneri et al., 2010; Modi et al., 2016; Wangzare et al., 2016) 7 1. Ask crowdworkers to write down a sequence of events. 2. The collected sequences are aligned with paraphrased events. 3. Cluster the aligned events. Multiple sequence alaingment EATING IN A FAST-FOOD RESTAURANT (Figure from Regneri et al., 2010)
  • 8. Two major approaches for Scripts in NLP 2. Script as paraphrase sets (Regneri et al., 2010; Modi et al., 2016; Wangzare et al., 2016) 8 1. Ask crowdworkers to write down a sequence of events. 2. The collected sequences are aligned with paraphrased events. 3. Cluster the aligned events. Multiple sequence alaingment EATING IN A FAST-FOOD RESTAURANT (Figure from Regneri et al., 2010) [Pros] •High quality (for everyday scenarios) [Cons] • Scalability (< 50) • No evaluation metric for modeling
  • 9. Our contributions Quality Scalability Script as narrative chain - + Script as paraphrase sets + - proScript + + 1. Crowdsourced 6.4k (partially ordered) scripts. 2. With this data, we adapt pre-trained neural LMs to generate high-quality scripts. 3. Proposed two complementary task definitions with proScirpt dataset. 9
  • 11. Crowdsourcing 11 1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts
  • 12. Crowdsourcing 12 1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts ROCStories (Mostafazadeh et al., 2016) → 2,564 scenarios Manually curate patterns - want(ed) to ... (e.g., go to Hawaii), - need(ed) to ... (e.g, get a haircut), - look(ing) to (e.g, buy a television). sign into email account, go to a bathroom, buy some new clothes, replace a closet door, DeScript (Wanzare et al., 2016) → 40 scenarios take a bath, do laundry, order a pizza, … VirtualHome (Puig et al., 2018) → 233 scenarios turn on light, put mail in mail organizer, put dishes away, …
  • 13. Crowdsourcing 13 1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts Suppose a scenario where someone wants to “travel to Hawaii”. Q1: Describe 5 to 7 essential steps and each time duration. (Note: the order does not matter.) decide schedule 1 hour book a flight go to airport 30 minutes 1 hour
  • 14. Crowdsourcing 14 1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts Suppose a scenario where someone wants to “travel to Hawaii”. Q1: Describe 5 to 7 essential steps and each time duration. (Note: the order does not matter.) decide schedule 1 hour book a flight go to airport Q2. Create a flowchart of the steps (possibly in partial order, where temporal ordering is required only when it is necessary.) 30 minutes 1 hour
  • 15. Crowdsourcing 15 1. Collect scenarios of scripts 2. Create partial order scripts 3.Validate the scripts Two different workers are asked to do the Q2. If both two validator created script graph that have low agreement (F1), it is discarded. E = Ê =
  • 16. proScript: Dataset statistics 16 buy some new clothes (1 hour) go to bathroom (5 mins) sign into email account (1 min) replace a closet door (1 day) find a new job (1 month) open a small business (1 year) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 ln m (m=minutes) 0.00 0.05 0.10 0.15 Normalized Density Degree>=3 5% Degree=2 28% Degree=1 67% Normalized histogram of time duration Degree of the graphs
  • 18. Two task settings 1. proScript Edge Prediction 18 2. proScript Generation
  • 19. Two task settings 1. proScript Edge Prediction 19 2. proScript Generation find the cake recipe gather the ingredients turn on the oven mix the ingredients put the cake batter in the oven bake for the right amount of time take the cake out of the oven Scenario: bake a cake Given: Scenario and randomly shuffled events Given: Scenario and the number of events (to generate) Scenario: bake a cake Number of events: 7
  • 20. Two task settings 1. proScript Edge Prediction 20 2. proScript Generation find the cake recipe gather the ingredients turn on the oven mix the ingredients put the cake batter in the oven bake for the right amount of time take the cake out of the oven Scenario: bake a cake Given: Scenario and randomly shuffled events Given: Scenario and the number of events (to generate) Scenario: bake a cake Number of events: 7 How to represent DAG?
  • 21. How to represent a DAG structure? — DOT language. 21 digraph G { A -> B; A -> C; B -> D; C -> D; D -> E; } =
  • 22. How to represent a DAG structure? — DOT language. 22 digraph G { A -> B; A -> C; B -> D; C -> D; D -> E; } digraph G{ Step0: find the cake recipe; Step1: gather the ingredients; Step2: mix the ingredients; (… omitted …) Step5: bake for the right amount of time; Step6: take the cake out of the oven; Step0 -> Step1; Step0 -> Step3; (… omitted …) Step5 -> Step6; } = =
  • 23. Two task settings 1. proScript Edge Prediction 23 2. proScript Generation
  • 24. Two task settings 1. proScript Edge Prediction 24 2. proScript Generation
  • 25. Models 1. proScript_gen T5 (11B) finetuning with proScript data (3.2k scenarios) 25
  • 26. Models 1. proScript_gen T5 (11B) finetuning with proScript data (3.2k scenarios) 2. proScript_transfer Pre-finetune with WikiHow data (130k) → finetune with proScript data 26
  • 27. Model Outputs (examples) by proScript_gen 27 Play the organ Drink a glass of milk Audition for a musical
  • 28. Model Outputs (examples) by proScript_gen 28 Play the organ Drink a glass of milk Audition for a musical How to evaluate these?
  • 29. How to evaluate the generated scripts (DAGs)? 29 Absolute evaluation Relative evaluation
  • 30. Absolute evaluation: graph edit distance (Abu-Aisheh et al., 2015) 30 Generated DAG Edited DAG
  • 31. Absolute evaluation: Result (lower GED, the better) 31 proScript_gen proScript_transfer Human 0 1.25 2.5 3.75 5 2.33 3.55 3.54 0.46 1.211 1.199 vertex edge Graph edit distance (random baseline = 11.3) • Random (11.3) >> proScript_gen = transfer (4.7) > human (2.7) • Edge-related edits > Vertex-related edits proScript_gen proScript_transfer
  • 32. Edit analysis 32 human error 5% granularity 32% order ambiguity 32% irrelevant/ redundant event 11% missing event 5% incorrect order 16% human error 23% paraphrase 7% granularity 27% order ambiguity 33% incorrect order 10% 30% 10% Edit Types (Scripts by model) Edit Types (Scripts by human) • 70% of edits are minor corrections. • proScript generates more crucial edits than human
  • 33. Relative evaluation: pairwise comparison 33 proScript_gen vs. Human Gen vs. Transfer proScript_gen transfer proScript_gen Human VS. VS.
  • 34. Relative evaluation: pairwise comparison 34 proScript_gen vs. Human Gen vs. Transfer proScript_gen transfer proScript_gen Human < = > < = > 55.3% 22.7% 22.0% 23.8% 45.6% 30.6%
  • 35. Summary 35 We collect 6.4k partially ordered scripts, proScript, which is substantially larger than prior datasets. With proScript, we introduced two complementary tasks and models. (edge prediction and script generation) We show the first time that pre-trained neural LM can be adapted to generate partial-order Scripts. Data will be available: https://proscript.allenai.org/