SlideShare a Scribd company logo
1 of 36
Natural Language Processing
to Improve Student Engagement
Becky Passonneau
November 30, 2017
Collaborators
Pyramid content evaluation: Ani Nenkova, (Columbia, U Penn; 2004)
Automated scoring by unigram overlap: Ani Nenkova, Aaron Harnly, Owen Rambow (Columbia; 2005)
Automated scoring by distributional semantics: Emily Chen, Ananya Poddar, Guarav Gite (Columbia; 2013
- 2016)
Comparison to educational rubric (main ideas): Dolores Perin (Teachers; 2013 - 2016)
Automated pyramid and scoring by triple extraction and similarity graphs based on WordNet: Qian Yang
(Tsinghua; PSU; 2016), Alisa Krivokapic (Columbia; 2016)
Automated pyramid and scoring by parsing, distributional semantics, and novel bin packing algorithm:
Yanjun Gao (Penn State; 2017)
2
Asking students to summarize the main ideas of a text helps
their reading and writing
3
Psychologists have posited three cognitive processes involved
in summarization:
● selection of important ideas
● generalization to omit detail
● inference of implicit connections
4
Summaries that are equally good will have some ideas in
common, and some differences
Very much like a Venn diagram
5
Summaries that are equally good will have some ideas in
common, and some content differences
Idea 2
Idea 3
Idea 1
Idea 4
Idea 8
Idea 7
Idea 9
Idea 11
Idea 10
Idea 12
Idea 5
Idea 4
Idea 6
Very much like a Venn diagram
6
Designing a reliable rubric to measure how many important ideas each
summary contains is labor intensive and potentially subjective
7
Summaries are concise
● Each idea is expressed once
○ Selection of important ideas
○ Omission of unnecessary detail
● Content evaluation task has two steps
○ Define a standard from expert summaries -- the distinct ideas weighted by
importance
○ Compare the summaries to the standard -- quantify the proportion of important
ideas
8
Pyramid summary content annotation builds a content model of distinct ideas
from summaries written by a wise crowd (size N)
9
CU 2
CU 3
CU 1
CU 4
CU 8
CU 7
CU 9
CU 11
CU 10
CU 12
CU 5
CU 4
CU 6
What is pyramid content analysis?
Importance of ideas (content units, or CUs)
● Emerges from the wise crowd
● Distinguishes quality of ideas by quantity of occurrence
● Simple but effective
Pyramid summary content annotation builds a content model of distinct ideas
from N reference summaries written by a wise crowd (size N)
10
CU 2
CU 3
CU 1
CU 4
CU 8
CU 7
CU 9
CU 11
CU 10
CU 12
CU 5
CU 4
CU 6
A list of all the distinct ideas or Content Units (CUs), and their
weights, i.e., how many summaries each occurs in
TEXT: WHAT IS MATTER
CU 1: Matter is classified by physical and chemical properties
W=3
CU 3: All matter has energy
W=2
. . .
CU 12: Matter can be a solid, liquid or gas
W=1
What is pyramid content analysis?
W=5
W=4
W=3
W=2
W=1
A Pyramid of CUs from a wise crowd of 5
What is wide crowd content analysis?
Application of Pyramid Content Model
● In a new summary, find all the phrases that mention a model CU
● Sum the weights of the mentioned CUs
● Normalize the sum
12
5
4
2
Raw sum = 5 + 4 + 2 = 11
What is wide crowd content analysis?
Normalization
● A summary can express each CU once at most
● Sum the weights of the identified CUs
● Normalize the sum in one of two ways:
○ QUALiTY: The maximum sum of weights for the same number of CUs
Did the summary mention mostly important ideas?
○ COVERAGE: The maximum sum of weights for the average number of CUs in
the reference summaries
Did the summary mention most of the important ideas?
13What is wide crowd content analysis?
● 9 reference summaries
● All content models with m
summaries, for m ∈ [1,9]
● All pairs of summaries A, B where
A > B using 9 reference summaries
● Result
○ The variance around scores for A and B
diverges given 4 to 5 references
● Conclusion
○ No misranking with 5 references
14How reliable is wide crowd content analysis?
How reliable is it? Can misranking errors occur?
Five additional reliability tests
15
Number of reference summaries for probability of
misranking to be ≤ 0.1
5
Number of reference summaries for scores to
correlate with gold standard scores
5
Interannotator agreement on content model, 10
different pairs of models
0.71 to 0.89
Interannotator agreement on application of content
model to new summaries, 5 models
0.77 to 0.81
Correlation of scores of 16 systems using different
content models
0.71 to 0.96
How reliable is wide crowd content analysis?
Key differences between manual and automated methods:
Humans
● Consider a few alternative segmentations
● Sameness of meaning is a binary (yes-no) judgement
Automated methods
● Consider many possible segmentations
○ Simpler decisions
○ Many more of them
● Metric for similarity of meaning is graded from 0 to 1
● Must select the optimal segmentations and meaning similarities
16How did we automate wise crowd content analysis?
Human segmentation into “ideas” and similarity
Sentence: Matter can be measured because it contains volume and mass
CU106: Matter has volume and mass (W=4)
Ref Sum 1: because it contains both volume and mass
Ref Sum 2: it takes up space defined as volume and contains . . . mass
Ref Sum 3: Matter is anything that has mass and takes up space (volume)
Ref Sum 4: Matter contains volume and mass
17How did we automate wise crowd content analysis?
Three Automated Methods
● No large scale machine learning required
● All components are pre-trained
● Requires only 5 wise-crowd summaries on same summarization task
18How did we automate pyramid content analysis?
Three Automated Methods
● PyrScore: Requires existing manual content model
○ Brute force segmentation -- considers all possibilities
○ Distributional (statistical) semantics
● PEAK:
○ Open Information Extraction tools extracts subj-pred-obj triples
○ Symbolic semantics (WordNet)
● PyrEval:
○ Sentence decomposition into clauses
○ Distributional (statistical semantics)
19How did we automate pyramid content analysis?
PyrScore Segmentation: Brute Force
● Calculates all ngram segmentations of each sentence in a new summary
All | matter | has | energy | volume | and | mass 7 unigrams
All | matter | has | energy | volume | and | mass 5 unigrams + 1 bigram
All | matter | has | energy | volume | and | mass 5 unigrams + 1 bigram
. . .
All matter has | energy | volume | and | mass 4 unigrams + tri gram
. . .
All matter has energy volume and mass 1 7gram
20How did we automate wise crowd content analysis?
PyrScore Semantics
● Generates a latent vector representation of each phrase in a CU
CU106: Matter has volume and mass (W=4)
because it contains both volume and mass
it takes up space defined as volume and contains a certain amount of material defined as mass
Matter is anything that has mass and takes up space (volume)
Matter contains volume and mass
21How did we automate wise crowd content analysis?
● Latent semantics:
○ Weighted Text Matrix Factorization (WTMF;
Guo and Diab, 2012)
○ Assigns small weight to unseen words
○ Word vectors trained offline
PyrScore Scoring
● Generates a WTMF vector representation of each CU phrase
● Generates a WTMF vector representation of each segment in a new
summary
● Similarity to CU is the average cosine similarity to all phrases in the CU
● Optimal assignment of candidate ngrams to CUs
○ A maximum weighted independent set problem
○ Applies a greedy algorithm (WMIN; Sakai et al 2003)
22How did we automate wise crowd content analysis?
PEAK (Pyramid Evaluation by Automated Knowledge Extraction)
● Segmentation: Applies Open Information Extraction tools to extract Subj-
Pred-Obj (SPO) triples from sentences
Matter can be detected and measured because it contains volume and mass
Subj(Matter) Pred(Detected and measured) Obj(because it contains volume and
mass)
Subj(Matter) Pred(contains) Obj(volume and mass)
. . .
● Semantics: Uses explicit representation of meaning (random walks over
WordNet)
23How did we automate wise crowd content analysis?
PEAK Aligns SPO Triples
● From different reference summaries to construct the model
● Uses a hypergraph
○ Triples are hyperedges of SPO nodes
○ Edges between nodes are semantic similarity
● Each CU is a weighted triple
24How did we automate wise crowd content analysis?
PEAK Aligns SPO Triples
● Each CU is a weighted triple
● New summary is a list of triples
● Edges in bipartite graph added from CUs to SPOs
if semantic similarity ≥ 0.50
● Uses the Munkres-Kuhn algorithm with CU
weights as edge costs
25How did we automate wise crowd content analysis?
PyrEval extends PyrScore
● Builds full pyramid using new weighted independent set algorithm
● Decomposes sentences into syntactically meaningful units (roughly clauses)
● Uses the same distributional semantics
○ WTMF performs better than Word2Vec
○ WTMF performs better than Glove
● Uses the same scoring algorithm
26
PyrEval constructs a pyramid by a novel set allocation method
● Nested sets
○ Every sentence has a set of segmentations, only one of which can be
selected
○ Every CU is a set of segments, each from a different summary
○ Every pyramid layer is a set of CUs
27
EDUA: Emergent discovery of units of attraction
28
● Constructs a graph
○ Nodes are segments
○ Edges weighted by force of “attraction” (e.g.,
semantic similarity)
● Edge types
○ Dashed edges: attraction(ni,nj) > 𝛂
○ Solid edges: connect segments into CUs
Assignment of segments to a CU obeys constraints
● Maximize the average Weighted Avg Similarity within each pyramid layer n
● Capacity of each layer y given segments x
● Relative size of each layer
● No empty layers
● One segmentation per sentence; at most one CU per segment
29
PyrEval and humans construct similar pyramids
● CUs: 69 (PyrEval) versus 60 (Annotator 1) or 46 (Annotator 2)
● Similar distribution
○ PyrEval: 1 w5, 2 w4, 7 w3, 22 w2, 37 w1
○ Annotator 1: 3 w5, 7 w4, 13 w3, 15 w2, 22 w1
● Example same weight
○ PyrEval (w5): Physical props can occur without changing the identity or nature of the matter
○ Annotator 1 (w5): Physical props can be observed without changing the identity of the matter
● Example different weight
○ PyrEval (w4): Unlike physical change, chemical change occurs when the chemical properties
of the matter have changed and a new substance is produced
○ Manual (w3): The difference between a physical change and a chemical change is that a
chemical change creates a new substance
30
A Rubric for Contextualized Curricular Support
● From a study of 16 community college classrooms
● 120 students wrote summaries of a middle school text,
What is Matter?
○ Read the passage
○ Answered main ideas questions
○ Wrote the summary
● Researchers identified 14 main ideas
● Main ideas score of a summary: % of main ideas
○ Included partial credit
○ Interrater reliability: Pearson correlation: 0.92
31What assessment rubric did we compare it to?
Pearson correlations of automated and manual methods
32
Correlation
PyrScore 0.95
PEAK 0.82
PyrEval 0.87
What were the results?
Pearson correlations of 120 Main Ideas scores and automated methods
33
Manual Test 120
PyrScore 0.83
PEAK 0.70
What were the results?
Content scores are transparent, can support feedback
● Does the summary have enough important ideas, given its length? (Quality
score)
● Does the summary have enough important ideas, given the set of possible
important ideas (Coverage score)
● Does the summary have a good balance of both (Comprehensive score)
● Which important ideas were expressed?
● Which important ideas were missed?
34
Conclusion
● Wise Crowd Content Analysis
○ Works well to identify important ideas
○ Importance emerges from the wise crowd
○ Correlates with an independently developed main ideas rubric
○ Requires only 5 reference summaries
● Fully automated methods: PyrEval and PEAK
○ Pretrained methods, and parameter tuning on small development set
○ Perform less well if sentences are very complex (e.g., automatic
summarizers on newswire)
○ Potential to inform revision
35Conclusion
What’s Next? Content assessment of essays
● Same ideas are referenced multiple times in the same essay, through
multiple means
○ Paraphrase, definite descriptions (“the evidence shown
here”), deictic pronouns (“This indicates . . .”)
○ Will require more complex methods to detect “the same” idea
● Discourse structure and function
○ Interrelations among ideas within the text
○ Discursive versus argumentative
36What’s next?

More Related Content

Similar to Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

Document Summarization
Document SummarizationDocument Summarization
Document SummarizationPratik Kumar
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsSharath TS
 
Ir mcq-answering-system
Ir mcq-answering-systemIr mcq-answering-system
Ir mcq-answering-systemJishnu P
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
 
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...Dat Nguyen
 
Text Processing Framework for Hindi
Text Processing Framework for HindiText Processing Framework for Hindi
Text Processing Framework for HindiUtsav Chokshi
 
AIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsAIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsSungjin Nam
 
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)Yun Huang
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University
 
Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)Thien Q. Tran
 
A comprehensive view on P vs NP
A comprehensive view on P vs NPA comprehensive view on P vs NP
A comprehensive view on P vs NPAbhay Pai
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsSergey Sosnovsky
 

Similar to Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau (20)

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Ire final
Ire finalIre final
Ire final
 
XAI (IIT-Patna).pdf
XAI (IIT-Patna).pdfXAI (IIT-Patna).pdf
XAI (IIT-Patna).pdf
 
Ir mcq-answering-system
Ir mcq-answering-systemIr mcq-answering-system
Ir mcq-answering-system
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
 
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading c...
 
Text Processing Framework for Hindi
Text Processing Framework for HindiText Processing Framework for Hindi
Text Processing Framework for Hindi
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
AIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsAIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community Presentations
 
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
 
Ire final
Ire finalIre final
Ire final
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)Introduction to TCAV (ICML2018)
Introduction to TCAV (ICML2018)
 
Icon18revrec sudeshna
Icon18revrec sudeshnaIcon18revrec sudeshna
Icon18revrec sudeshna
 
A comprehensive view on P vs NP
A comprehensive view on P vs NPA comprehensive view on P vs NP
A comprehensive view on P vs NP
 
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge ModelsGeneration of Assessment Questions from Textbooks Enriched with Knowledge Models
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
 
Active learning
Active learningActive learning
Active learning
 
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
 

More from Penn State EdTech Network

More from Penn State EdTech Network (11)

Advancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial IntelligenceAdvancing Personalized Learning through Big Data and Artificial Intelligence
Advancing Personalized Learning through Big Data and Artificial Intelligence
 
The Art of the Possible featuring Ben Amaba
The Art of the Possible featuring Ben AmabaThe Art of the Possible featuring Ben Amaba
The Art of the Possible featuring Ben Amaba
 
Using Bluemix and Node-RED for Fast Prototyping
Using Bluemix and Node-RED for Fast PrototypingUsing Bluemix and Node-RED for Fast Prototyping
Using Bluemix and Node-RED for Fast Prototyping
 
Design Thinking Workshop at Nittany Watson Challenge
Design Thinking Workshop at Nittany Watson ChallengeDesign Thinking Workshop at Nittany Watson Challenge
Design Thinking Workshop at Nittany Watson Challenge
 
Data Cleansing
Data CleansingData Cleansing
Data Cleansing
 
Watson API Use Case Demos for the Nittany Watson Challenge
Watson API Use Case Demos for the Nittany Watson ChallengeWatson API Use Case Demos for the Nittany Watson Challenge
Watson API Use Case Demos for the Nittany Watson Challenge
 
IBM Watson Tradeoff Analytics AlChemy
IBM Watson Tradeoff Analytics AlChemyIBM Watson Tradeoff Analytics AlChemy
IBM Watson Tradeoff Analytics AlChemy
 
What is the Data Panel
What is the Data PanelWhat is the Data Panel
What is the Data Panel
 
Introduction to Bluemix
Introduction to BluemixIntroduction to Bluemix
Introduction to Bluemix
 
IBM Watson Overview
IBM Watson OverviewIBM Watson Overview
IBM Watson Overview
 
Nittany Watson Challenge Opening Presentations
Nittany Watson Challenge Opening PresentationsNittany Watson Challenge Opening Presentations
Nittany Watson Challenge Opening Presentations
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Natural Language Processing to Improve Student Engagement featuring Dr. Rebecca Passonneau

  • 1. Natural Language Processing to Improve Student Engagement Becky Passonneau November 30, 2017
  • 2. Collaborators Pyramid content evaluation: Ani Nenkova, (Columbia, U Penn; 2004) Automated scoring by unigram overlap: Ani Nenkova, Aaron Harnly, Owen Rambow (Columbia; 2005) Automated scoring by distributional semantics: Emily Chen, Ananya Poddar, Guarav Gite (Columbia; 2013 - 2016) Comparison to educational rubric (main ideas): Dolores Perin (Teachers; 2013 - 2016) Automated pyramid and scoring by triple extraction and similarity graphs based on WordNet: Qian Yang (Tsinghua; PSU; 2016), Alisa Krivokapic (Columbia; 2016) Automated pyramid and scoring by parsing, distributional semantics, and novel bin packing algorithm: Yanjun Gao (Penn State; 2017) 2
  • 3. Asking students to summarize the main ideas of a text helps their reading and writing 3
  • 4. Psychologists have posited three cognitive processes involved in summarization: ● selection of important ideas ● generalization to omit detail ● inference of implicit connections 4
  • 5. Summaries that are equally good will have some ideas in common, and some differences Very much like a Venn diagram 5
  • 6. Summaries that are equally good will have some ideas in common, and some content differences Idea 2 Idea 3 Idea 1 Idea 4 Idea 8 Idea 7 Idea 9 Idea 11 Idea 10 Idea 12 Idea 5 Idea 4 Idea 6 Very much like a Venn diagram 6
  • 7. Designing a reliable rubric to measure how many important ideas each summary contains is labor intensive and potentially subjective 7
  • 8. Summaries are concise ● Each idea is expressed once ○ Selection of important ideas ○ Omission of unnecessary detail ● Content evaluation task has two steps ○ Define a standard from expert summaries -- the distinct ideas weighted by importance ○ Compare the summaries to the standard -- quantify the proportion of important ideas 8
  • 9. Pyramid summary content annotation builds a content model of distinct ideas from summaries written by a wise crowd (size N) 9 CU 2 CU 3 CU 1 CU 4 CU 8 CU 7 CU 9 CU 11 CU 10 CU 12 CU 5 CU 4 CU 6 What is pyramid content analysis? Importance of ideas (content units, or CUs) ● Emerges from the wise crowd ● Distinguishes quality of ideas by quantity of occurrence ● Simple but effective
  • 10. Pyramid summary content annotation builds a content model of distinct ideas from N reference summaries written by a wise crowd (size N) 10 CU 2 CU 3 CU 1 CU 4 CU 8 CU 7 CU 9 CU 11 CU 10 CU 12 CU 5 CU 4 CU 6 A list of all the distinct ideas or Content Units (CUs), and their weights, i.e., how many summaries each occurs in TEXT: WHAT IS MATTER CU 1: Matter is classified by physical and chemical properties W=3 CU 3: All matter has energy W=2 . . . CU 12: Matter can be a solid, liquid or gas W=1 What is pyramid content analysis?
  • 11. W=5 W=4 W=3 W=2 W=1 A Pyramid of CUs from a wise crowd of 5 What is wide crowd content analysis?
  • 12. Application of Pyramid Content Model ● In a new summary, find all the phrases that mention a model CU ● Sum the weights of the mentioned CUs ● Normalize the sum 12 5 4 2 Raw sum = 5 + 4 + 2 = 11 What is wide crowd content analysis?
  • 13. Normalization ● A summary can express each CU once at most ● Sum the weights of the identified CUs ● Normalize the sum in one of two ways: ○ QUALiTY: The maximum sum of weights for the same number of CUs Did the summary mention mostly important ideas? ○ COVERAGE: The maximum sum of weights for the average number of CUs in the reference summaries Did the summary mention most of the important ideas? 13What is wide crowd content analysis?
  • 14. ● 9 reference summaries ● All content models with m summaries, for m ∈ [1,9] ● All pairs of summaries A, B where A > B using 9 reference summaries ● Result ○ The variance around scores for A and B diverges given 4 to 5 references ● Conclusion ○ No misranking with 5 references 14How reliable is wide crowd content analysis? How reliable is it? Can misranking errors occur?
  • 15. Five additional reliability tests 15 Number of reference summaries for probability of misranking to be ≤ 0.1 5 Number of reference summaries for scores to correlate with gold standard scores 5 Interannotator agreement on content model, 10 different pairs of models 0.71 to 0.89 Interannotator agreement on application of content model to new summaries, 5 models 0.77 to 0.81 Correlation of scores of 16 systems using different content models 0.71 to 0.96 How reliable is wide crowd content analysis?
  • 16. Key differences between manual and automated methods: Humans ● Consider a few alternative segmentations ● Sameness of meaning is a binary (yes-no) judgement Automated methods ● Consider many possible segmentations ○ Simpler decisions ○ Many more of them ● Metric for similarity of meaning is graded from 0 to 1 ● Must select the optimal segmentations and meaning similarities 16How did we automate wise crowd content analysis?
  • 17. Human segmentation into “ideas” and similarity Sentence: Matter can be measured because it contains volume and mass CU106: Matter has volume and mass (W=4) Ref Sum 1: because it contains both volume and mass Ref Sum 2: it takes up space defined as volume and contains . . . mass Ref Sum 3: Matter is anything that has mass and takes up space (volume) Ref Sum 4: Matter contains volume and mass 17How did we automate wise crowd content analysis?
  • 18. Three Automated Methods ● No large scale machine learning required ● All components are pre-trained ● Requires only 5 wise-crowd summaries on same summarization task 18How did we automate pyramid content analysis?
  • 19. Three Automated Methods ● PyrScore: Requires existing manual content model ○ Brute force segmentation -- considers all possibilities ○ Distributional (statistical) semantics ● PEAK: ○ Open Information Extraction tools extracts subj-pred-obj triples ○ Symbolic semantics (WordNet) ● PyrEval: ○ Sentence decomposition into clauses ○ Distributional (statistical semantics) 19How did we automate pyramid content analysis?
  • 20. PyrScore Segmentation: Brute Force ● Calculates all ngram segmentations of each sentence in a new summary All | matter | has | energy | volume | and | mass 7 unigrams All | matter | has | energy | volume | and | mass 5 unigrams + 1 bigram All | matter | has | energy | volume | and | mass 5 unigrams + 1 bigram . . . All matter has | energy | volume | and | mass 4 unigrams + tri gram . . . All matter has energy volume and mass 1 7gram 20How did we automate wise crowd content analysis?
  • 21. PyrScore Semantics ● Generates a latent vector representation of each phrase in a CU CU106: Matter has volume and mass (W=4) because it contains both volume and mass it takes up space defined as volume and contains a certain amount of material defined as mass Matter is anything that has mass and takes up space (volume) Matter contains volume and mass 21How did we automate wise crowd content analysis? ● Latent semantics: ○ Weighted Text Matrix Factorization (WTMF; Guo and Diab, 2012) ○ Assigns small weight to unseen words ○ Word vectors trained offline
  • 22. PyrScore Scoring ● Generates a WTMF vector representation of each CU phrase ● Generates a WTMF vector representation of each segment in a new summary ● Similarity to CU is the average cosine similarity to all phrases in the CU ● Optimal assignment of candidate ngrams to CUs ○ A maximum weighted independent set problem ○ Applies a greedy algorithm (WMIN; Sakai et al 2003) 22How did we automate wise crowd content analysis?
  • 23. PEAK (Pyramid Evaluation by Automated Knowledge Extraction) ● Segmentation: Applies Open Information Extraction tools to extract Subj- Pred-Obj (SPO) triples from sentences Matter can be detected and measured because it contains volume and mass Subj(Matter) Pred(Detected and measured) Obj(because it contains volume and mass) Subj(Matter) Pred(contains) Obj(volume and mass) . . . ● Semantics: Uses explicit representation of meaning (random walks over WordNet) 23How did we automate wise crowd content analysis?
  • 24. PEAK Aligns SPO Triples ● From different reference summaries to construct the model ● Uses a hypergraph ○ Triples are hyperedges of SPO nodes ○ Edges between nodes are semantic similarity ● Each CU is a weighted triple 24How did we automate wise crowd content analysis?
  • 25. PEAK Aligns SPO Triples ● Each CU is a weighted triple ● New summary is a list of triples ● Edges in bipartite graph added from CUs to SPOs if semantic similarity ≥ 0.50 ● Uses the Munkres-Kuhn algorithm with CU weights as edge costs 25How did we automate wise crowd content analysis?
  • 26. PyrEval extends PyrScore ● Builds full pyramid using new weighted independent set algorithm ● Decomposes sentences into syntactically meaningful units (roughly clauses) ● Uses the same distributional semantics ○ WTMF performs better than Word2Vec ○ WTMF performs better than Glove ● Uses the same scoring algorithm 26
  • 27. PyrEval constructs a pyramid by a novel set allocation method ● Nested sets ○ Every sentence has a set of segmentations, only one of which can be selected ○ Every CU is a set of segments, each from a different summary ○ Every pyramid layer is a set of CUs 27
  • 28. EDUA: Emergent discovery of units of attraction 28 ● Constructs a graph ○ Nodes are segments ○ Edges weighted by force of “attraction” (e.g., semantic similarity) ● Edge types ○ Dashed edges: attraction(ni,nj) > 𝛂 ○ Solid edges: connect segments into CUs
  • 29. Assignment of segments to a CU obeys constraints ● Maximize the average Weighted Avg Similarity within each pyramid layer n ● Capacity of each layer y given segments x ● Relative size of each layer ● No empty layers ● One segmentation per sentence; at most one CU per segment 29
  • 30. PyrEval and humans construct similar pyramids ● CUs: 69 (PyrEval) versus 60 (Annotator 1) or 46 (Annotator 2) ● Similar distribution ○ PyrEval: 1 w5, 2 w4, 7 w3, 22 w2, 37 w1 ○ Annotator 1: 3 w5, 7 w4, 13 w3, 15 w2, 22 w1 ● Example same weight ○ PyrEval (w5): Physical props can occur without changing the identity or nature of the matter ○ Annotator 1 (w5): Physical props can be observed without changing the identity of the matter ● Example different weight ○ PyrEval (w4): Unlike physical change, chemical change occurs when the chemical properties of the matter have changed and a new substance is produced ○ Manual (w3): The difference between a physical change and a chemical change is that a chemical change creates a new substance 30
  • 31. A Rubric for Contextualized Curricular Support ● From a study of 16 community college classrooms ● 120 students wrote summaries of a middle school text, What is Matter? ○ Read the passage ○ Answered main ideas questions ○ Wrote the summary ● Researchers identified 14 main ideas ● Main ideas score of a summary: % of main ideas ○ Included partial credit ○ Interrater reliability: Pearson correlation: 0.92 31What assessment rubric did we compare it to?
  • 32. Pearson correlations of automated and manual methods 32 Correlation PyrScore 0.95 PEAK 0.82 PyrEval 0.87 What were the results?
  • 33. Pearson correlations of 120 Main Ideas scores and automated methods 33 Manual Test 120 PyrScore 0.83 PEAK 0.70 What were the results?
  • 34. Content scores are transparent, can support feedback ● Does the summary have enough important ideas, given its length? (Quality score) ● Does the summary have enough important ideas, given the set of possible important ideas (Coverage score) ● Does the summary have a good balance of both (Comprehensive score) ● Which important ideas were expressed? ● Which important ideas were missed? 34
  • 35. Conclusion ● Wise Crowd Content Analysis ○ Works well to identify important ideas ○ Importance emerges from the wise crowd ○ Correlates with an independently developed main ideas rubric ○ Requires only 5 reference summaries ● Fully automated methods: PyrEval and PEAK ○ Pretrained methods, and parameter tuning on small development set ○ Perform less well if sentences are very complex (e.g., automatic summarizers on newswire) ○ Potential to inform revision 35Conclusion
  • 36. What’s Next? Content assessment of essays ● Same ideas are referenced multiple times in the same essay, through multiple means ○ Paraphrase, definite descriptions (“the evidence shown here”), deictic pronouns (“This indicates . . .”) ○ Will require more complex methods to detect “the same” idea ● Discourse structure and function ○ Interrelations among ideas within the text ○ Discursive versus argumentative 36What’s next?