SlideShare a Scribd company logo
Effective Feature Integration for
Automated Short Answer Scoring
Keisuke Sakaguchi,1 Michael Heilman,2 Nitin Madnani2
1Johns Hopkins University, CLSP & 2Educational Testing Service
6/2/2015, NAACL 2015
What is Automated Short Answer Scoring?
Passage (700~ words)
2
A boy tried to snatch ---
Student Responses
A woman was ---
l 
l 
l 
A boy ran up behind her
and tried to snatch her
purse. ~
What is Automated Short Answer Scoring?
Passage (700~ words)
3
A boy tried to snatch ---
Student Responses
A woman was ---
l 
l 
l 
A boy ran up behind her
and tried to snatch her
purse. ~
Build a regression model
0 (bad) to 4 (good) scale
Predict a score for a
given student answer
What is Automated Short Answer Scoring?
Passage (700~ words)
4
A boy tried to snatch ---
Student Responses
A woman was ---
l 
l 
l 
Reference for scoring
A boy ran up behind her
and tried to snatch her
purse. ~
1~2 exemplar answers
Brief key concepts (<10)
Build a regression model
0 (bad) to 4 (good) scale
Predict a score for a
given student answer
Two Basic Approaches
Passage (700~ words)
5
A boy tried to snatch ---
Student Responses
A woman was ---
l 
l 
l 
1~2 exemplar answers
Brief key concepts (<10)
Reference for scoring
A boy ran up behind her
and tried to snatch her
purse. ~
Build a regression model
0 (bad) to 4 (good) scale
Predict a score for a
given student answer
Response-based features
6
A boy tried to snatch a lady’s purse --- --- .
Response-based features
7
A boy tried to snatch a lady’s purse --- --- .
Length
Word n-gram (e.g. bigrams)
Character n-gram (e.g. 2-5 grams)
Syntactic dependency (e.g. PARENT-LABEL-CHILD)
Semantic Roles (e.g. TRY-A0-BOY)
Reference-based features
8
Exemplar
(Score 4) A boy tried to steal ---
(Score 3) A lady’s purse ---
…
(Score 0) I don’t know.
Key concepts
(#1) A boy tried to steal a woman’s purse.
(#2) A lady caught him.
…
(#N) She lets him leave.
A boy tried to snatch a lady’s purse --- --- .
Reference-based features
9
Exemplar
(Score 4) A boy tried to steal ---
(Score 3) A lady’s purse ---
…
(Score 0) I don’t know.
Key concepts
(#1) A boy tried to steal a woman’s purse.
(#2) A lady caught him.
…
(#N) She lets him leave.
A boy tried to snatch a lady’s purse --- --- .
Similarity
10
Similarity metrics
Overall similarity
1. BLEU score
2. word2vec cosine (sentence-level)
Alignment-based similarity
3. word2vec alignment
4. WordNet alignment
4 metrics * (exemplars + key concepts)
Reference-based features
Similarity
Alignment-based Semantic Similarity
11
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Alignment-based Semantic Similarity
12
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Filter by POS tags (N, V, ADJ, ADV)
Alignment-based Semantic Similarity
13
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Find out the most similar word (via WN or w2v)
1
len(S)
Ws
max
Wr R
Sim(Ws, Wr)
Alignment-based Semantic Similarity
14
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Find out the most similar word (via WN or w2v)
1
len(S)
Ws
max
Wr R
Sim(Ws, Wr)
Alignment-based Semantic Similarity
15
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Find out the most similar word (via WN or w2v)
1
len(S)
Ws
max
Wr R
Sim(Ws, Wr)
Alignment-based Semantic Similarity
16
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Find out the most similar word (via WN or w2v)
1
len(S)
Ws
max
Wr R
Sim(Ws, Wr)
Alignment-based Semantic Similarity
17
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Reference
Find out the most similar word (via WN or w2v)
1
len(S)
Ws
max
Wr R
Sim(Ws, Wr)
Alignment-based Semantic Similarity
18
A boy trying to snatch a lady's purse …
A boy tried to steal a woman’s purse …
Student
Response
Sentence-level similarity: taking the average
1
len(S)
Ws
max
Wr R
Sim(Ws, Wr)
Ref 0.8
19
Two Basic Approaches: Review
Response-based Reference-based
Length
Word n-gram
Character n-gram
Syntactic dependency
Semantic Roles
Several similarity metrics
BLEU score
word2vec cosine
word2vec alignment
WordNet alignment
20
Models
Response-based
Reference-based
Response-based Reference-based+
Build Support Vector Regression models (SVR) on:
21
Models
Response-based
Reference-based
Response-based Reference-based+
Build Support Vector Regression models (SVR) on:
Wait! Naïve feature combination does not work!
22
Look closely at the features
Response-based Reference-based
Length
Word n-gram
Character n-gram
Syntactic dependency
Semantic Roles
Several similarity metrics
BLEU score
word2vec cosine
word2vec alignment
WordNet alignment
23
Look closely at the features
Response-based Reference-based
Length
Word n-gram
Character n-gram
Syntactic dependency
Semantic Roles
Several similarity metrics
BLEU score
word2vec cosine
word2vec alignment
WordNet alignment
binary & sparse continuous & dense
90% 10%
Stacked Generalization (Wolpert, 1992)
24
Layer 1
Classifier
Layer 2
Classifier
Classifier
Classifier
Training Set
Training Set
Training Set
output
final output
Stacking model for our task
25
Naïve feature combination
SVR
Predicted
score
Reference-based
(continuous & dense)
Response-based
(binary & sparse)
Stacking model for our task
26
Layer 1
SVR#1
Layer 2
SVR#2
Score as a
dense feature
Predicted
score
Reference-based
(continuous & dense)
Response-based
(binary & sparse)
Experimental Setting
27
Dataset:
Reading for Understanding (RfU)
Designed for 6th – 9th grade students
4 short-answer questions on 2 different passages
5 training sizes (100, 200, 400, 800, All) * 20 runs
Learner:
Support Vector Regression (Linear)
Evaluation:
Quadratic Weighted Kappa with 10-fold CV
(on held-out test set)
Results (Q1)
28
: Resp: Ref
0.72
0.76
0.8
0.84
100 200 400 800 ALL (1790)
Training Size
Results (Q1)
29
: Resp: Ref
0.72
0.76
0.8
0.84
100 200 400 800 ALL (1790)
: w/o Stack
Training Size
Results (Q1)
30
: Resp: Ref
0.72
0.76
0.8
0.84
100 200 400 800 ALL (1790)
: w/o Stack : w/ Stack
Training Size
0.4
0.5
0.6
0.7
0.54
0.6
0.66
0.72
0.72
0.76
0.8
0.84
All Results
31
Q1
Q2
Q3
Q4
: Resp: Ref : w/o Stack : w/ Stack
0.48
0.56
0.64
0.72
Summary
32
Automated Short Answer Scoring
Response-based and Reference-based approaches
Response-based: binary sparse features
Reference-based: continuous dense features
Model combination (with stacking) improved performance
Look at the stats of your features and apply stacking J

More Related Content

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

NAACL15_sakaguchi

  • 1. Effective Feature Integration for Automated Short Answer Scoring Keisuke Sakaguchi,1 Michael Heilman,2 Nitin Madnani2 1Johns Hopkins University, CLSP & 2Educational Testing Service 6/2/2015, NAACL 2015
  • 2. What is Automated Short Answer Scoring? Passage (700~ words) 2 A boy tried to snatch --- Student Responses A woman was --- l  l  l  A boy ran up behind her and tried to snatch her purse. ~
  • 3. What is Automated Short Answer Scoring? Passage (700~ words) 3 A boy tried to snatch --- Student Responses A woman was --- l  l  l  A boy ran up behind her and tried to snatch her purse. ~ Build a regression model 0 (bad) to 4 (good) scale Predict a score for a given student answer
  • 4. What is Automated Short Answer Scoring? Passage (700~ words) 4 A boy tried to snatch --- Student Responses A woman was --- l  l  l  Reference for scoring A boy ran up behind her and tried to snatch her purse. ~ 1~2 exemplar answers Brief key concepts (<10) Build a regression model 0 (bad) to 4 (good) scale Predict a score for a given student answer
  • 5. Two Basic Approaches Passage (700~ words) 5 A boy tried to snatch --- Student Responses A woman was --- l  l  l  1~2 exemplar answers Brief key concepts (<10) Reference for scoring A boy ran up behind her and tried to snatch her purse. ~ Build a regression model 0 (bad) to 4 (good) scale Predict a score for a given student answer
  • 6. Response-based features 6 A boy tried to snatch a lady’s purse --- --- .
  • 7. Response-based features 7 A boy tried to snatch a lady’s purse --- --- . Length Word n-gram (e.g. bigrams) Character n-gram (e.g. 2-5 grams) Syntactic dependency (e.g. PARENT-LABEL-CHILD) Semantic Roles (e.g. TRY-A0-BOY)
  • 8. Reference-based features 8 Exemplar (Score 4) A boy tried to steal --- (Score 3) A lady’s purse --- … (Score 0) I don’t know. Key concepts (#1) A boy tried to steal a woman’s purse. (#2) A lady caught him. … (#N) She lets him leave. A boy tried to snatch a lady’s purse --- --- .
  • 9. Reference-based features 9 Exemplar (Score 4) A boy tried to steal --- (Score 3) A lady’s purse --- … (Score 0) I don’t know. Key concepts (#1) A boy tried to steal a woman’s purse. (#2) A lady caught him. … (#N) She lets him leave. A boy tried to snatch a lady’s purse --- --- . Similarity
  • 10. 10 Similarity metrics Overall similarity 1. BLEU score 2. word2vec cosine (sentence-level) Alignment-based similarity 3. word2vec alignment 4. WordNet alignment 4 metrics * (exemplars + key concepts) Reference-based features Similarity
  • 11. Alignment-based Semantic Similarity 11 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference
  • 12. Alignment-based Semantic Similarity 12 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference Filter by POS tags (N, V, ADJ, ADV)
  • 13. Alignment-based Semantic Similarity 13 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference Find out the most similar word (via WN or w2v) 1 len(S) Ws max Wr R Sim(Ws, Wr)
  • 14. Alignment-based Semantic Similarity 14 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference Find out the most similar word (via WN or w2v) 1 len(S) Ws max Wr R Sim(Ws, Wr)
  • 15. Alignment-based Semantic Similarity 15 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference Find out the most similar word (via WN or w2v) 1 len(S) Ws max Wr R Sim(Ws, Wr)
  • 16. Alignment-based Semantic Similarity 16 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference Find out the most similar word (via WN or w2v) 1 len(S) Ws max Wr R Sim(Ws, Wr)
  • 17. Alignment-based Semantic Similarity 17 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Reference Find out the most similar word (via WN or w2v) 1 len(S) Ws max Wr R Sim(Ws, Wr)
  • 18. Alignment-based Semantic Similarity 18 A boy trying to snatch a lady's purse … A boy tried to steal a woman’s purse … Student Response Sentence-level similarity: taking the average 1 len(S) Ws max Wr R Sim(Ws, Wr) Ref 0.8
  • 19. 19 Two Basic Approaches: Review Response-based Reference-based Length Word n-gram Character n-gram Syntactic dependency Semantic Roles Several similarity metrics BLEU score word2vec cosine word2vec alignment WordNet alignment
  • 21. 21 Models Response-based Reference-based Response-based Reference-based+ Build Support Vector Regression models (SVR) on: Wait! Naïve feature combination does not work!
  • 22. 22 Look closely at the features Response-based Reference-based Length Word n-gram Character n-gram Syntactic dependency Semantic Roles Several similarity metrics BLEU score word2vec cosine word2vec alignment WordNet alignment
  • 23. 23 Look closely at the features Response-based Reference-based Length Word n-gram Character n-gram Syntactic dependency Semantic Roles Several similarity metrics BLEU score word2vec cosine word2vec alignment WordNet alignment binary & sparse continuous & dense 90% 10%
  • 24. Stacked Generalization (Wolpert, 1992) 24 Layer 1 Classifier Layer 2 Classifier Classifier Classifier Training Set Training Set Training Set output final output
  • 25. Stacking model for our task 25 Naïve feature combination SVR Predicted score Reference-based (continuous & dense) Response-based (binary & sparse)
  • 26. Stacking model for our task 26 Layer 1 SVR#1 Layer 2 SVR#2 Score as a dense feature Predicted score Reference-based (continuous & dense) Response-based (binary & sparse)
  • 27. Experimental Setting 27 Dataset: Reading for Understanding (RfU) Designed for 6th – 9th grade students 4 short-answer questions on 2 different passages 5 training sizes (100, 200, 400, 800, All) * 20 runs Learner: Support Vector Regression (Linear) Evaluation: Quadratic Weighted Kappa with 10-fold CV (on held-out test set)
  • 28. Results (Q1) 28 : Resp: Ref 0.72 0.76 0.8 0.84 100 200 400 800 ALL (1790) Training Size
  • 29. Results (Q1) 29 : Resp: Ref 0.72 0.76 0.8 0.84 100 200 400 800 ALL (1790) : w/o Stack Training Size
  • 30. Results (Q1) 30 : Resp: Ref 0.72 0.76 0.8 0.84 100 200 400 800 ALL (1790) : w/o Stack : w/ Stack Training Size
  • 32. Summary 32 Automated Short Answer Scoring Response-based and Reference-based approaches Response-based: binary sparse features Reference-based: continuous dense features Model combination (with stacking) improved performance Look at the stats of your features and apply stacking J