SlideShare a Scribd company logo
1 of 20
Download to read offline
AUTOMATICALLY
GRADING BRAZILIAN
STUDENT ESSAYS
Erick Fonseca, Ivo Medeiros, Dayse
Kamikawachi, Alessandro Bokan
PROPOR 2018
AUTOMATIC ESSAY SCORING (AES)
➤ Score students essays — somewhat subjective!
➤ Fast, cheap and deterministic
➤ Can be exploited by students
➤ Good for feedback during writing practice
2
AES APPROACHES
➤ Early AES systems trained regressors with a large number of
features:
➤ Counts of words
➤ POS tags
➤ Syntactic structures
➤ Named entities
➤ n-grams
➤ Spelling and grammar mistakes
➤ etc…
3
Essay
NLP tools
Features
Regressor
AES APPROACHES
➤ More recently, neural networks
➤ Create a vector representation for the
essay
➤ Learn a scorer
➤ Different architectures:
➤ CNNs or RNNs
➤ Single level or sentence level followed
by text level
4
Essay
Deep neural net
CNN/RNN
Essay vector
Scorer
LET’S TRY BOTH!
➤ We tried both neural networks and feature-based models
➤ Compare their pros and cons!
➤ We used a dataset of ~56k essays graded by humans
➤ Larger than the English benchmark!
5
AES IN PORTUGUESE — ENEM
➤ Exam for high school students
➤ Argumentative essays with a given topic
➤ ENEM scores essays in five competencies:
1.Standard written norm
2.Adherence to the topic and style
3.Defend a point of view
4.Usage of argumentative language
5.Proposal of a solution for the given problem
6
ENEM DATASET
➤ Each competency is scored from 0 to 200
➤ Total essay score from 0 to 1000
➤ Scores have a gaussian distribution
7
HOW OUR DATA LOOKS LIKE
8
Metric Mean (sd)
Tokens / sentence 32.0 (±18.2)
Tokens / essay 329.2 (±101.4)
Sentences / paragraph 2.4 (±1.3)
Sentences / essay 10.3 (±4.3)
Paragraphs / essay 4.3 (±1.0)
DEEP NEURAL NETWORK
➤ The good:
➤ Simpler to design
➤ No need to handcraft features
➤ Can learn some subtleties which are hard to describe
➤ The bad:
➤ Harder to train
➤ Needs much more computational power
➤ Careful parameter tuning
9
DEEP NEURAL NETWORK
➤ Two levels of LSTMs
1. Read words and generate sentence vectors
2. Read sentences and generate an essay vector
10
DEEP NEURAL NETWORK
➤ Two levels of LSTMs
1. Read word embeddings and generate sentence vectors
2. Read sentences and generate an essay vector
11
DEEP NEURAL NETWORK
➤ Some variations yielded worse results
➤ Max pooling instead of mean
➤ CNNs instead of LSTMs
➤ The network outputs 5 scores
➤ Sigmoid activation; normalize scores to range [0, 1]
➤ Extra hidden layers did not help
➤ Optimize the Mean Squared Errors:
12
5
∑
i
(yi − ̂
yi)2
FEATURE ENGINEERING
➤ The bad:
➤ Hard to design
➤ Try to explain what makes an essay great!
➤ Needs more preprocessing tools
➤ The good:
➤ Computationally faster
➤ Easier to interpret
13
FEATURE ENGINEERING
➤ Only run a POS tagger
➤ Parsing is challenging because of mistakes (future work!)
➤ Use a list of hand picked expressions
➤ Connectives, propositives, oralities
➤ Use a list of automatically extracted words and n-grams
➤ Appearing in 5-50% of the essays
➤ Pearson ρ≥0.1 with scores
14
FEATURE ENGINEERING — FEATURES
➤ Extract a vector of 681 features:
➤ Number of commas, characters, tokens, types, sentences,
token/sentence ratio, OOV words, OOV types, words from
the prompt (…)
➤ Presence of words and phrases from the handcrafted lists
➤ Presence of relevant words and n-grams
➤ Counts and ratios of each POS tag
➤ Presence of relevant POS tag n-grams
➤ For each competency, only keep features with ρ≥0.1
15
EXPERIMENTAL SETUP
➤ Two metrics:
➤ Quadratic Weighted Kappa (QWK) — Popular metric for
AES; but disregards the error magnitude
➤ Root Mean Squared Error (RMSE) — More appropriate
for regression
➤ We compare with Amorim & Veloso (2017)
➤ Only other work in Portuguese
➤ … but with another and smaller corpus
16
RESULTS
17
Model C1 C2 C3 C4 C5 Total
Gradient Boosting 25.81 26.02 27.40 28.34 41.19 100.00
Linear Regression 26.10 26.37 27.75 28.42 42.07 101.53
Deep Network 27.75 26.58 27.51 29.26 38.85 100.59
Average baseline 38.26 33.53 34.72 39.47 55.27 160.42
RMSE (lower is better)
Model C1 C2 C3 C4 C5 Total
Gradient Boosting 0.676 0.511 0.508 0.619 0.577 0.752
Linear Regression 0.667 0.499 0.493 0.615 0.564 0.747
Deep Network 0.615 0.503 0.500 0.508 0.636 0.750
Average Baseline 0 0 0 0 0 0
Amorim & Veloso 0.315 0.268 0.231 0.270 0.139 0.367
QWK (higher is better)
CONCLUSIONS
➤ Feature engineering models are better at C1-4
➤ Easier competencies to describe how to score
➤ Competency 5 is the most difficult to score
➤ Neural networks are better at it
➤ … because of subjectivity?
➤ Our models are more stable across competencies than
Amorim & Veloso
➤ RMSE makes clear which competencies are harder
18
CONCLUSIONS
➤ AES is still incipient in Portuguese!
➤ Feature-based models and DNNs have comparable
performance
➤ Many interesting directions for future works!
➤ Parsing (for grammatically incorrect sentences)
➤ Other network architectures
➤ Evaluate students’ writing skill evolution
19
THANK YOU!
QUESTIONS?
erick@letrus.com.br
ivopdm@letrus.com.br
dayse@letrus.com.br
abokan@letrus.com.br

More Related Content

Similar to Automatically Grading Brazilian Student Essays.pdf

NoSQL in Perspective
NoSQL in PerspectiveNoSQL in Perspective
NoSQL in Perspective
Jeff Smith
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9
google
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
Ross Lawley
 

Similar to Automatically Grading Brazilian Student Essays.pdf (20)

Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
 
NoSQL in Perspective
NoSQL in PerspectiveNoSQL in Perspective
NoSQL in Perspective
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
pm1
pm1pm1
pm1
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editing
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Symbolic Execution And KLEE
Symbolic Execution And KLEESymbolic Execution And KLEE
Symbolic Execution And KLEE
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Simulation lab
Simulation labSimulation lab
Simulation lab
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
Software Craftmanship - Cours Polytech
Software Craftmanship - Cours PolytechSoftware Craftmanship - Cours Polytech
Software Craftmanship - Cours Polytech
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep Dive
 
midterm_fa07.pdf
midterm_fa07.pdfmidterm_fa07.pdf
midterm_fa07.pdf
 
Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!Catalyst - refactor large apps with it and have fun!
Catalyst - refactor large apps with it and have fun!
 
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
A Neural Grammatical Error Correction built on Better Pre-training and Sequen...
 
Generalized Pipeline Parallelism for DNN Training
Generalized Pipeline Parallelism for DNN TrainingGeneralized Pipeline Parallelism for DNN Training
Generalized Pipeline Parallelism for DNN Training
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 

More from Sandra Valenzuela

More from Sandra Valenzuela (20)

Essay Websites Introduction To A Compare And Contrast
Essay Websites Introduction To A Compare And ContrastEssay Websites Introduction To A Compare And Contrast
Essay Websites Introduction To A Compare And Contrast
 
Short Essay On Mother. Short
Short Essay On Mother. ShortShort Essay On Mother. Short
Short Essay On Mother. Short
 
Cheapest Essay Writing Service At 7Page - Hire Essay Writer
Cheapest Essay Writing Service At 7Page - Hire Essay WriterCheapest Essay Writing Service At 7Page - Hire Essay Writer
Cheapest Essay Writing Service At 7Page - Hire Essay Writer
 
A Separate Peace Critical Lens Essay Sample
A Separate Peace Critical Lens Essay SampleA Separate Peace Critical Lens Essay Sample
A Separate Peace Critical Lens Essay Sample
 
15 Best Images Of Personal Narrative Writing Workshe
15 Best Images Of Personal Narrative Writing Workshe15 Best Images Of Personal Narrative Writing Workshe
15 Best Images Of Personal Narrative Writing Workshe
 
13 Best Images Of English Introduction Worksheet - E
13 Best Images Of English Introduction Worksheet - E13 Best Images Of English Introduction Worksheet - E
13 Best Images Of English Introduction Worksheet - E
 
How To Write An English Essay (With Sample Essay
How To Write An English Essay (With Sample EssayHow To Write An English Essay (With Sample Essay
How To Write An English Essay (With Sample Essay
 
Steps In Writing A Literature Review By Literary Devic
Steps In Writing A Literature Review By Literary DevicSteps In Writing A Literature Review By Literary Devic
Steps In Writing A Literature Review By Literary Devic
 
How To Write An Introduction For A Research Paper - Alex
How To Write An Introduction For A Research Paper - AlexHow To Write An Introduction For A Research Paper - Alex
How To Write An Introduction For A Research Paper - Alex
 
Rhetorical Situations Of Essay 1
Rhetorical Situations Of Essay 1Rhetorical Situations Of Essay 1
Rhetorical Situations Of Essay 1
 
Sample Thesis Chapter 3 Research Locale - Custom P
Sample Thesis Chapter 3 Research Locale - Custom PSample Thesis Chapter 3 Research Locale - Custom P
Sample Thesis Chapter 3 Research Locale - Custom P
 
001 How To Start College Essay About Yourself Off
001 How To Start College Essay About Yourself Off001 How To Start College Essay About Yourself Off
001 How To Start College Essay About Yourself Off
 
Compare Contrast High School And Co
Compare Contrast High School And CoCompare Contrast High School And Co
Compare Contrast High School And Co
 
Print Kindergarten Writing Paper
Print Kindergarten Writing PaperPrint Kindergarten Writing Paper
Print Kindergarten Writing Paper
 
6 Best Images Of Printable Love Letter Border - Letter
6 Best Images Of Printable Love Letter Border - Letter6 Best Images Of Printable Love Letter Border - Letter
6 Best Images Of Printable Love Letter Border - Letter
 
Printable Lined Paper Landscape - Printable World Holiday
Printable Lined Paper Landscape - Printable World HolidayPrintable Lined Paper Landscape - Printable World Holiday
Printable Lined Paper Landscape - Printable World Holiday
 
Sample Thesis Statement For Compare And Contrast Essay
Sample Thesis Statement For Compare And Contrast EssaySample Thesis Statement For Compare And Contrast Essay
Sample Thesis Statement For Compare And Contrast Essay
 
How To Make A Proper Essay. Essay Tips 7 Tips
How To Make A Proper Essay. Essay Tips 7 TipsHow To Make A Proper Essay. Essay Tips 7 Tips
How To Make A Proper Essay. Essay Tips 7 Tips
 
Writing My Self Assessment Examples Five Paragrap
Writing My Self Assessment Examples Five ParagrapWriting My Self Assessment Examples Five Paragrap
Writing My Self Assessment Examples Five Paragrap
 
Writing Freelance Articles In First-Person - WriterS Digest
Writing Freelance Articles In First-Person - WriterS DigestWriting Freelance Articles In First-Person - WriterS Digest
Writing Freelance Articles In First-Person - WriterS Digest
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Automatically Grading Brazilian Student Essays.pdf

  • 1. AUTOMATICALLY GRADING BRAZILIAN STUDENT ESSAYS Erick Fonseca, Ivo Medeiros, Dayse Kamikawachi, Alessandro Bokan PROPOR 2018
  • 2. AUTOMATIC ESSAY SCORING (AES) ➤ Score students essays — somewhat subjective! ➤ Fast, cheap and deterministic ➤ Can be exploited by students ➤ Good for feedback during writing practice 2
  • 3. AES APPROACHES ➤ Early AES systems trained regressors with a large number of features: ➤ Counts of words ➤ POS tags ➤ Syntactic structures ➤ Named entities ➤ n-grams ➤ Spelling and grammar mistakes ➤ etc… 3 Essay NLP tools Features Regressor
  • 4. AES APPROACHES ➤ More recently, neural networks ➤ Create a vector representation for the essay ➤ Learn a scorer ➤ Different architectures: ➤ CNNs or RNNs ➤ Single level or sentence level followed by text level 4 Essay Deep neural net CNN/RNN Essay vector Scorer
  • 5. LET’S TRY BOTH! ➤ We tried both neural networks and feature-based models ➤ Compare their pros and cons! ➤ We used a dataset of ~56k essays graded by humans ➤ Larger than the English benchmark! 5
  • 6. AES IN PORTUGUESE — ENEM ➤ Exam for high school students ➤ Argumentative essays with a given topic ➤ ENEM scores essays in five competencies: 1.Standard written norm 2.Adherence to the topic and style 3.Defend a point of view 4.Usage of argumentative language 5.Proposal of a solution for the given problem 6
  • 7. ENEM DATASET ➤ Each competency is scored from 0 to 200 ➤ Total essay score from 0 to 1000 ➤ Scores have a gaussian distribution 7
  • 8. HOW OUR DATA LOOKS LIKE 8 Metric Mean (sd) Tokens / sentence 32.0 (±18.2) Tokens / essay 329.2 (±101.4) Sentences / paragraph 2.4 (±1.3) Sentences / essay 10.3 (±4.3) Paragraphs / essay 4.3 (±1.0)
  • 9. DEEP NEURAL NETWORK ➤ The good: ➤ Simpler to design ➤ No need to handcraft features ➤ Can learn some subtleties which are hard to describe ➤ The bad: ➤ Harder to train ➤ Needs much more computational power ➤ Careful parameter tuning 9
  • 10. DEEP NEURAL NETWORK ➤ Two levels of LSTMs 1. Read words and generate sentence vectors 2. Read sentences and generate an essay vector 10
  • 11. DEEP NEURAL NETWORK ➤ Two levels of LSTMs 1. Read word embeddings and generate sentence vectors 2. Read sentences and generate an essay vector 11
  • 12. DEEP NEURAL NETWORK ➤ Some variations yielded worse results ➤ Max pooling instead of mean ➤ CNNs instead of LSTMs ➤ The network outputs 5 scores ➤ Sigmoid activation; normalize scores to range [0, 1] ➤ Extra hidden layers did not help ➤ Optimize the Mean Squared Errors: 12 5 ∑ i (yi − ̂ yi)2
  • 13. FEATURE ENGINEERING ➤ The bad: ➤ Hard to design ➤ Try to explain what makes an essay great! ➤ Needs more preprocessing tools ➤ The good: ➤ Computationally faster ➤ Easier to interpret 13
  • 14. FEATURE ENGINEERING ➤ Only run a POS tagger ➤ Parsing is challenging because of mistakes (future work!) ➤ Use a list of hand picked expressions ➤ Connectives, propositives, oralities ➤ Use a list of automatically extracted words and n-grams ➤ Appearing in 5-50% of the essays ➤ Pearson ρ≥0.1 with scores 14
  • 15. FEATURE ENGINEERING — FEATURES ➤ Extract a vector of 681 features: ➤ Number of commas, characters, tokens, types, sentences, token/sentence ratio, OOV words, OOV types, words from the prompt (…) ➤ Presence of words and phrases from the handcrafted lists ➤ Presence of relevant words and n-grams ➤ Counts and ratios of each POS tag ➤ Presence of relevant POS tag n-grams ➤ For each competency, only keep features with ρ≥0.1 15
  • 16. EXPERIMENTAL SETUP ➤ Two metrics: ➤ Quadratic Weighted Kappa (QWK) — Popular metric for AES; but disregards the error magnitude ➤ Root Mean Squared Error (RMSE) — More appropriate for regression ➤ We compare with Amorim & Veloso (2017) ➤ Only other work in Portuguese ➤ … but with another and smaller corpus 16
  • 17. RESULTS 17 Model C1 C2 C3 C4 C5 Total Gradient Boosting 25.81 26.02 27.40 28.34 41.19 100.00 Linear Regression 26.10 26.37 27.75 28.42 42.07 101.53 Deep Network 27.75 26.58 27.51 29.26 38.85 100.59 Average baseline 38.26 33.53 34.72 39.47 55.27 160.42 RMSE (lower is better) Model C1 C2 C3 C4 C5 Total Gradient Boosting 0.676 0.511 0.508 0.619 0.577 0.752 Linear Regression 0.667 0.499 0.493 0.615 0.564 0.747 Deep Network 0.615 0.503 0.500 0.508 0.636 0.750 Average Baseline 0 0 0 0 0 0 Amorim & Veloso 0.315 0.268 0.231 0.270 0.139 0.367 QWK (higher is better)
  • 18. CONCLUSIONS ➤ Feature engineering models are better at C1-4 ➤ Easier competencies to describe how to score ➤ Competency 5 is the most difficult to score ➤ Neural networks are better at it ➤ … because of subjectivity? ➤ Our models are more stable across competencies than Amorim & Veloso ➤ RMSE makes clear which competencies are harder 18
  • 19. CONCLUSIONS ➤ AES is still incipient in Portuguese! ➤ Feature-based models and DNNs have comparable performance ➤ Many interesting directions for future works! ➤ Parsing (for grammatically incorrect sentences) ➤ Other network architectures ➤ Evaluate students’ writing skill evolution 19