SlideShare a Scribd company logo
A Pipeline for Modeling Automated
Scoring Using Python, R and
Jupyter Notebooks
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Nitin Madnani, Anastassia Loukina & Lei Chen
Machine Learning &
Educational Assessment
A Pythonic Love Story
Nitin Madnani, Anastassia Loukina & Lei Chen
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Testing Service
• A non-profit educational organization founded in 1947,
headquartered in Princeton, New Jersey (N≊3500).
• Designs and administers global as well as domestic educational
assessments (GRE®, TOEFL®, PRAXIS® etc.)
• Conducts and publishes extensive research on psychometrics,
statistics, cognitive science, and computer science.[1]
• Mission: To advance quality and equity in education by providing
fair and valid assessments, research and related services.
3
[1]	http://search.ets.org/researcher/
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Two Parts
• Part 1: What makes educational assessment a
challenging application for machine learning?



• Part 2: How does Python help us address some of these
challenges at ETS?
4
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational
Assessment
Machine
Learning
Part 1
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
K-12
Standardized
Tests
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
MOOC
Assignments
TOEFL/IELTS
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
GRE
MOOC
Assignments
TOEFL/IELTS
Teacher
Certification
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
GRE
MOOC
Assignments
TOEFL/IELTS
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
6
Classroom Quiz
GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
7
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
8
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment High Stakes
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
“High Stakes”
• A test with results that have important, direct
consequences for the test-takers.



• A test-taker would want to understand what their score
means and how it maps to what they did on the test.
9
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
10
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment High Stakes
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational Assessments
11
Classroom Quiz GRE
MOOC
Assignments
TOEFL/IELTS
GED
Teacher
Certification
GMAT
K-12
Standardized
Tests
Homework
Assignment High Stakes
Practice Tests
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
The GRE
• Graduate Record Examination, designed and administered by ETS.
• Used by at least 3000 colleges and universities across the world for
graduate school applications to MS, MBA & PhD programs.[1]
• ~575,000 test-takers from ~200 countries between July 2013 and June
2014 (50% women, 45% men). [2]
• Three sections:
• Verbal Reasoning
• Quantitative Reasoning
• Analytical Writing
12
[2]	http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1]	https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
The GRE
• Graduate Record Examination, designed and administered by ETS.
• Used by at least 3000 colleges and universities across the world for
graduate school applications to MS, MBA & PhD programs.[1]
• ~575,000 test-takers from ~200 countries between July 2013 and June
2014 (50% women, 45% men). [2]
• Three sections:
• Verbal Reasoning
• Quantitative Reasoning
• Analytical Writing
13
[2]	http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1]	https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
GRE Analytical Writing
14
“As people rely more and more on technology to solve problems,
the ability of humans to think for themselves will surely deteriorate.”


Directions: Write a response in which you discuss the extent to
which you agree or disagree with the statement and explain your
reasoning for the position you take.
https://www.ets.org/gre/revised_general/prepare/analytical_writing/issue/scoring_guide
Score 6. Outstanding
articulates a clear and insightful position
develops the position fully
well-focused, well-organized analysis
conveys ideas fluently and precisely
demonstrates superior facility with English
Score 1. Fundamentally Deficient
provides little/no evidence of understanding
disorganized or extremely brief
severe problems with sentence structure
pervasive errors in grammar
incoherent and meaning not clear
…
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
Given the stakes, our scoring methodology must
maximize:
• Accuracy: how accurately does the assigned score measure
the analytical skills of the test-taker?
• Interpretability: how easily can test-takers understand why
they was assigned a particular score and what that score
means?
15
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
Given the stakes, our scoring methodology must
maximize:
• Accuracy: how accurately does the assigned score measure
the analytical skills of the test-taker?
• Interpretability: how easily can test-takers understand why
they was assigned a particular score and what that score
means?
15
It would also be nice to minimize:
• Cost: how efficiently can we score each test (how much money
can we save the test-taker in fees)?
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
16
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
17
Essay
Scoring
Guide
Trained Human Readers
High Accuracy
Medium Interpretability
High Cost
Option 1
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
18
Essay
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Medium Accuracy
(Choice of) High Interpretability
Low Cost
Option 2
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Scoring essays
19
Essay
Scoring
Guide
One Trained Human Reader
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Final score
Human Score
System Score
As good as using two
human readers[1].
[1]	http://www.ets.org/Media/Research/pdf/RD_Connections2.pdf
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
E-rater
20
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
E-rater
20
“Essay Rater”

Linear regression trained on older essays
written to the same topic and scored by
human readers.

Features
errors in grammar (e.g., subject-verb agreement)
usage errors (incorrect prepositions/articles)
mechanics errors (capitalization, spelling)
errors in style (repetitious word use)
discourse structure (presence of a thesis
statement, main points)
vocabulary sophistication
essay organization
Automated	Essay	Scoring	With	e-rater®	V.2,	The	Journal	of	Technology,	Learning,	and	Assessment,	Volume	4(3),	2006	
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
E-rater & Research
21
E-rater still an active area of research at ETS

Design new features; examine their effect on
performance, and whether they overlap with
existing features.
Try more sophisticated machine learning
models (higher accuracy worth lower
interpretability?)
Last year, 10 new e-rater features proposed
just for GRE!
GRE one of a dozen assessments, e-rater one
of many automated scoring engines
Research untenable for a large group (>15
scientists) without a standardized pipeline.
Scoring
Guide
Features
Automated Scoring System
(Machine Learning)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Need an end-to-end machine learning pipeline that can:
• Work on (almost) all platforms,
• Read features in any tabular format and clean it up,
• Efficiently apply filtering, scaling and transformations,
• Train any specified model with those features, and
• Generate a standardized, detailed report of performance on
unseen essays.
22
Ideal research pipeline
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational
Assessment
Machine
Learning
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Educational
Assessment
Machine
Learning
Python
Part 2
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
25
Python Pipeline
Input
Preprocess
Model
Evaluate
Report
Input
final self-contained report
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Model Name
(str)
Training Features
(csv/tsv/xls)
Unseen Test Features
(csv/tsv/xls)
Feature Definitions
(json)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Model Name
(str)
Training Features
(csv/tsv/xls)
Unseen Test Features
(csv/tsv/xls)
Feature Definitions
(json)
1.Input
• Read files into data frames
• Check for missing feature columns, exclude others
• Filter out non-numeric and blank values
• Standardize essay ID and essay score column names
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
26
Model Name
(str)
Training Features
(csv/tsv/xls)
Unseen Test Features
(csv/tsv/xls)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
1.Input
• Read files into data frames
• Check for missing feature columns, exclude others
• Filter out non-numeric and blank values
• Standardize essay ID and essay score column names
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
27
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
27
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
2.Preprocess
• Filter out user-flagged rows, if so specified
• Remove feature outliers & “intelligently” apply feature
transformations (log,	inv,	sqrt, etc.), if available
• Standardize all features (center and scale)
numpy	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
27
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Raw)
Test
Data
Frame
(Raw)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
2.Preprocess
• Filter out user-flagged rows, if so specified
• Remove feature outliers & “intelligently” apply feature
transformations (log,	inv,	sqrt, etc.), if available
• Standardize all features (center and scale)
numpy	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
SKLL (pronounced “skull”) provides an API and command-line
utilities to make it much simpler to run common scikit-learn
experiments with pre-generated features.
(Presented by @dsblanch at PyData 2013 & 2014)
https://github.com/EducationalTestingService/skll
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
28
Model Name
(str)
Feature Definitions
(json)
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
3.Model
• Train regression/classification model via SKLL API or R
• Grid-search using a task-appropriate objective
• Serializes model to disk (using joblib)
R	
+		
skll
Serialized model
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
29
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
29
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
4.Evaluate
• Use serialized model to compute test set predictions
• Trim and re-scale predictions to match training data
• Compute a set of standard evaluation metrics by
comparing predictions to test set human scores
skll	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
29
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Test
Data
Predictions
Evaluation
Statistics
4.Evaluate
• Use serialized model to compute test set predictions
• Trim and re-scale predictions to match training data
• Compute a set of standard evaluation metrics by
comparing predictions to test set human scores
skll	
+	
pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
30
Test
Data
Predictions
Evaluation
Statistics
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
30
Test
Data
Predictions
Evaluation
Statistics
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
5.Report
• Determine what report sections should be included
• Merge pre-existing section templates (.ipynb files)
• Dynamically Run final .ipynb file (via
ExecutePreprocessor and environment variables)
• Convert report to HTML using HTMLExporter
jupyter	
+	seaborn		
	+	pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
30
Test
Data
Predictions
Evaluation
Statistics
Training
Data
Frame
(Processed)
Test
Data
Frame
(Processed)
Serialized model
Final
Report
(.ipynb)
Final
Report
(html)
5.Report
• Determine what report sections should be included
• Merge pre-existing section templates (.ipynb files)
• Dynamically Run final .ipynb file (via
ExecutePreprocessor and environment variables)
• Convert report to HTML using HTMLExporter
jupyter	
+	seaborn		
	+	pandas
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Demo
31
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Summary
• Machine learning in high-stakes educational assessment requires
additional number crunching to verify accuracy and interpretability.
• Need a pipeline to compare a large number of research experiments
using a standardized, easy-to-read report.
• The scientific Python stack makes it super easy to implement all
stages of the pipeline!
• In progress
• Release under open-source license (2016 release)
• A CherryPy/JS web-app to allow wider reach
32
Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
Questions?
33
https://github.com/EducationalTestingService	
https://github.com/desilinguist	
@haikuman

More Related Content

Similar to Pipeline for Modeling Automated Scoring (PyData NYC 2015)

JRT Presentation
JRT PresentationJRT Presentation
JRT Presentation
tdimattia
 
JRT Training Presentation
JRT Training PresentationJRT Training Presentation
JRT Training Presentation
tdimattia
 
Elar module 5 oct16
Elar module 5 oct16Elar module 5 oct16
Elar module 5 oct16
Megan Berger
 
On the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smithOn the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smith
clintos
 
COMEDK UGET 2011
COMEDK UGET 2011COMEDK UGET 2011
COMEDK UGET 2011
narender yadav
 
Non-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student SuccessNon-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student Success
wmiller824
 
Vet courses for schools
Vet courses for schoolsVet courses for schools
Vet courses for schools
Marry Davis
 
OBTC Capability Statement
OBTC Capability StatementOBTC Capability Statement
OBTC Capability Statement
Stephen Bowerman
 
Initial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-PenaInitial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-Pena
IMI PQ NET Romania
 
Ielts
IeltsIelts
SAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor'sSAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor's
Test Mentor LLC
 
Fall 2011 NT4CM
Fall 2011 NT4CMFall 2011 NT4CM
Fall 2011 NT4CM
David Olson
 
Youth4work Prep Tests
Youth4work Prep TestsYouth4work Prep Tests
Youth4work Prep Tests
Youth4work.com
 
Resonance.ac.in
Resonance.ac.inResonance.ac.in
Resonance.ac.in
saikat bhowmick
 
Online assignment
Online assignmentOnline assignment
Online assignment
deepthirajesh
 
Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...
Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...
Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...
eaquals
 
Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab
Pearson North America
 
Pedagogy assignment
Pedagogy assignmentPedagogy assignment
Pedagogy assignment
reshmafmtc
 
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docxCase Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
wendolynhalbert
 
International Entrance Exams to Study Overseas
International Entrance Exams to Study OverseasInternational Entrance Exams to Study Overseas
International Entrance Exams to Study Overseas
IntStu
 

Similar to Pipeline for Modeling Automated Scoring (PyData NYC 2015) (20)

JRT Presentation
JRT PresentationJRT Presentation
JRT Presentation
 
JRT Training Presentation
JRT Training PresentationJRT Training Presentation
JRT Training Presentation
 
Elar module 5 oct16
Elar module 5 oct16Elar module 5 oct16
Elar module 5 oct16
 
On the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smithOn the wrong tram ll 1210 clint smith
On the wrong tram ll 1210 clint smith
 
COMEDK UGET 2011
COMEDK UGET 2011COMEDK UGET 2011
COMEDK UGET 2011
 
Non-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student SuccessNon-Cognitive Factors as Predictors of Student Success
Non-Cognitive Factors as Predictors of Student Success
 
Vet courses for schools
Vet courses for schoolsVet courses for schools
Vet courses for schools
 
OBTC Capability Statement
OBTC Capability StatementOBTC Capability Statement
OBTC Capability Statement
 
Initial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-PenaInitial Teacher Training - Eligio Cerval-Pena
Initial Teacher Training - Eligio Cerval-Pena
 
Ielts
IeltsIelts
Ielts
 
SAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor'sSAT Math Workbook chapter-1-TestMentor's
SAT Math Workbook chapter-1-TestMentor's
 
Fall 2011 NT4CM
Fall 2011 NT4CMFall 2011 NT4CM
Fall 2011 NT4CM
 
Youth4work Prep Tests
Youth4work Prep TestsYouth4work Prep Tests
Youth4work Prep Tests
 
Resonance.ac.in
Resonance.ac.inResonance.ac.in
Resonance.ac.in
 
Online assignment
Online assignmentOnline assignment
Online assignment
 
Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...
Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...
Tom Kiddle & Eaquals Assessment Group Members: Integrated Skills Assessment i...
 
Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab Dr. Connie Johnson: Student Success & MyFoundationsLab
Dr. Connie Johnson: Student Success & MyFoundationsLab
 
Pedagogy assignment
Pedagogy assignmentPedagogy assignment
Pedagogy assignment
 
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docxCase Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
Case Study MBA Schools in Asia-Pacific Grading GuideQNT561.docx
 
International Entrance Exams to Study Overseas
International Entrance Exams to Study OverseasInternational Entrance Exams to Study Overseas
International Entrance Exams to Study Overseas
 

Recently uploaded

P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
AnasAhmadNoor
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Unit -II Spectroscopy - EC I B.Tech.pdf
Unit -II Spectroscopy - EC  I B.Tech.pdfUnit -II Spectroscopy - EC  I B.Tech.pdf
Unit -II Spectroscopy - EC I B.Tech.pdf
TeluguBadi
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
DharmaBanothu
 
Bituminous road construction project based learning report
Bituminous road construction project based learning reportBituminous road construction project based learning report
Bituminous road construction project based learning report
CE19KaushlendraKumar
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 

Recently uploaded (20)

P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Unit -II Spectroscopy - EC I B.Tech.pdf
Unit -II Spectroscopy - EC  I B.Tech.pdfUnit -II Spectroscopy - EC  I B.Tech.pdf
Unit -II Spectroscopy - EC I B.Tech.pdf
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
 
Bituminous road construction project based learning report
Bituminous road construction project based learning reportBituminous road construction project based learning report
Bituminous road construction project based learning report
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 

Pipeline for Modeling Automated Scoring (PyData NYC 2015)

  • 1. A Pipeline for Modeling Automated Scoring Using Python, R and Jupyter Notebooks Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Nitin Madnani, Anastassia Loukina & Lei Chen
  • 2. Machine Learning & Educational Assessment A Pythonic Love Story Nitin Madnani, Anastassia Loukina & Lei Chen Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141
  • 3. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Testing Service • A non-profit educational organization founded in 1947, headquartered in Princeton, New Jersey (N≊3500). • Designs and administers global as well as domestic educational assessments (GRE®, TOEFL®, PRAXIS® etc.) • Conducts and publishes extensive research on psychometrics, statistics, cognitive science, and computer science.[1] • Mission: To advance quality and equity in education by providing fair and valid assessments, research and related services. 3 [1] http://search.ets.org/researcher/
  • 4. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Two Parts • Part 1: What makes educational assessment a challenging application for machine learning?
 
 • Part 2: How does Python help us address some of these challenges at ETS? 4
  • 5. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessment Machine Learning Part 1
  • 6. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6
  • 7. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz
  • 8. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz Homework Assignment
  • 9. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments Homework Assignment
  • 10. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments K-12 Standardized Tests Homework Assignment
  • 11. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments Teacher Certification K-12 Standardized Tests Homework Assignment
  • 12. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments Teacher Certification K-12 Standardized Tests Homework Assignment Practice Tests
  • 13. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz MOOC Assignments TOEFL/IELTS Teacher Certification K-12 Standardized Tests Homework Assignment Practice Tests
  • 14. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS Teacher Certification K-12 Standardized Tests Homework Assignment Practice Tests
  • 15. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS Teacher Certification GMAT K-12 Standardized Tests Homework Assignment Practice Tests
  • 16. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 6 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment Practice Tests
  • 17. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 7 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment Practice Tests
  • 18. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 8 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment High Stakes Practice Tests
  • 19. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 “High Stakes” • A test with results that have important, direct consequences for the test-takers.
 
 • A test-taker would want to understand what their score means and how it maps to what they did on the test. 9
  • 20. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 10 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment High Stakes Practice Tests
  • 21. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessments 11 Classroom Quiz GRE MOOC Assignments TOEFL/IELTS GED Teacher Certification GMAT K-12 Standardized Tests Homework Assignment High Stakes Practice Tests
  • 22. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 The GRE • Graduate Record Examination, designed and administered by ETS. • Used by at least 3000 colleges and universities across the world for graduate school applications to MS, MBA & PhD programs.[1] • ~575,000 test-takers from ~200 countries between July 2013 and June 2014 (50% women, 45% men). [2] • Three sections: • Verbal Reasoning • Quantitative Reasoning • Analytical Writing 12 [2] http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1] https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
  • 23. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 The GRE • Graduate Record Examination, designed and administered by ETS. • Used by at least 3000 colleges and universities across the world for graduate school applications to MS, MBA & PhD programs.[1] • ~575,000 test-takers from ~200 countries between July 2013 and June 2014 (50% women, 45% men). [2] • Three sections: • Verbal Reasoning • Quantitative Reasoning • Analytical Writing 13 [2] http://www.ets.org/s/gre/pdf/snapshot_test_taker_data_2014.pdf[1] https://www.ets.org/s/gre/pdf/gre_aidi_fellowships.pdf
  • 24. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 GRE Analytical Writing 14 “As people rely more and more on technology to solve problems, the ability of humans to think for themselves will surely deteriorate.” 
 Directions: Write a response in which you discuss the extent to which you agree or disagree with the statement and explain your reasoning for the position you take. https://www.ets.org/gre/revised_general/prepare/analytical_writing/issue/scoring_guide Score 6. Outstanding articulates a clear and insightful position develops the position fully well-focused, well-organized analysis conveys ideas fluently and precisely demonstrates superior facility with English Score 1. Fundamentally Deficient provides little/no evidence of understanding disorganized or extremely brief severe problems with sentence structure pervasive errors in grammar incoherent and meaning not clear …
  • 25. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays Given the stakes, our scoring methodology must maximize: • Accuracy: how accurately does the assigned score measure the analytical skills of the test-taker? • Interpretability: how easily can test-takers understand why they was assigned a particular score and what that score means? 15
  • 26. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays Given the stakes, our scoring methodology must maximize: • Accuracy: how accurately does the assigned score measure the analytical skills of the test-taker? • Interpretability: how easily can test-takers understand why they was assigned a particular score and what that score means? 15 It would also be nice to minimize: • Cost: how efficiently can we score each test (how much money can we save the test-taker in fees)?
  • 27. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 16
  • 28. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 17 Essay Scoring Guide Trained Human Readers High Accuracy Medium Interpretability High Cost Option 1
  • 29. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 18 Essay Scoring Guide Features Automated Scoring System (Machine Learning) Medium Accuracy (Choice of) High Interpretability Low Cost Option 2
  • 30. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Scoring essays 19 Essay Scoring Guide One Trained Human Reader Scoring Guide Features Automated Scoring System (Machine Learning) Final score Human Score System Score As good as using two human readers[1]. [1] http://www.ets.org/Media/Research/pdf/RD_Connections2.pdf
  • 31. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 E-rater 20 Scoring Guide Features Automated Scoring System (Machine Learning)
  • 32. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 E-rater 20 “Essay Rater”
 Linear regression trained on older essays written to the same topic and scored by human readers.
 Features errors in grammar (e.g., subject-verb agreement) usage errors (incorrect prepositions/articles) mechanics errors (capitalization, spelling) errors in style (repetitious word use) discourse structure (presence of a thesis statement, main points) vocabulary sophistication essay organization Automated Essay Scoring With e-rater® V.2, The Journal of Technology, Learning, and Assessment, Volume 4(3), 2006 Scoring Guide Features Automated Scoring System (Machine Learning)
  • 33. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 E-rater & Research 21 E-rater still an active area of research at ETS
 Design new features; examine their effect on performance, and whether they overlap with existing features. Try more sophisticated machine learning models (higher accuracy worth lower interpretability?) Last year, 10 new e-rater features proposed just for GRE! GRE one of a dozen assessments, e-rater one of many automated scoring engines Research untenable for a large group (>15 scientists) without a standardized pipeline. Scoring Guide Features Automated Scoring System (Machine Learning)
  • 34. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Need an end-to-end machine learning pipeline that can: • Work on (almost) all platforms, • Read features in any tabular format and clean it up, • Efficiently apply filtering, scaling and transformations, • Train any specified model with those features, and • Generate a standardized, detailed report of performance on unseen essays. 22 Ideal research pipeline
  • 35. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessment Machine Learning
  • 36. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Educational Assessment Machine Learning Python Part 2
  • 37. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 25 Python Pipeline Input Preprocess Model Evaluate Report Input final self-contained report
  • 38. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26
  • 39. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26 Model Name (str) Training Features (csv/tsv/xls) Unseen Test Features (csv/tsv/xls) Feature Definitions (json)
  • 40. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26 Model Name (str) Training Features (csv/tsv/xls) Unseen Test Features (csv/tsv/xls) Feature Definitions (json) 1.Input • Read files into data frames • Check for missing feature columns, exclude others • Filter out non-numeric and blank values • Standardize essay ID and essay score column names pandas
  • 41. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 26 Model Name (str) Training Features (csv/tsv/xls) Unseen Test Features (csv/tsv/xls) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw) 1.Input • Read files into data frames • Check for missing feature columns, exclude others • Filter out non-numeric and blank values • Standardize essay ID and essay score column names pandas
  • 42. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 27 Model Name (str) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw)
  • 43. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 27 Model Name (str) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw) 2.Preprocess • Filter out user-flagged rows, if so specified • Remove feature outliers & “intelligently” apply feature transformations (log, inv, sqrt, etc.), if available • Standardize all features (center and scale) numpy + pandas
  • 44. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 27 Model Name (str) Feature Definitions (json) Training Data Frame (Raw) Test Data Frame (Raw) Training Data Frame (Processed) Test Data Frame (Processed) 2.Preprocess • Filter out user-flagged rows, if so specified • Remove feature outliers & “intelligently” apply feature transformations (log, inv, sqrt, etc.), if available • Standardize all features (center and scale) numpy + pandas
  • 45. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed)
  • 46. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll
  • 47. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll SKLL (pronounced “skull”) provides an API and command-line utilities to make it much simpler to run common scikit-learn experiments with pre-generated features. (Presented by @dsblanch at PyData 2013 & 2014) https://github.com/EducationalTestingService/skll
  • 48. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll
  • 49. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 28 Model Name (str) Feature Definitions (json) Training Data Frame (Processed) Test Data Frame (Processed) 3.Model • Train regression/classification model via SKLL API or R • Grid-search using a task-appropriate objective • Serializes model to disk (using joblib) R + skll Serialized model
  • 50. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 29 Training Data Frame (Processed) Test Data Frame (Processed) Serialized model
  • 51. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 29 Training Data Frame (Processed) Test Data Frame (Processed) Serialized model 4.Evaluate • Use serialized model to compute test set predictions • Trim and re-scale predictions to match training data • Compute a set of standard evaluation metrics by comparing predictions to test set human scores skll + pandas
  • 52. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 29 Training Data Frame (Processed) Test Data Frame (Processed) Serialized model Test Data Predictions Evaluation Statistics 4.Evaluate • Use serialized model to compute test set predictions • Trim and re-scale predictions to match training data • Compute a set of standard evaluation metrics by comparing predictions to test set human scores skll + pandas
  • 53. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 30 Test Data Predictions Evaluation Statistics Training Data Frame (Processed) Test Data Frame (Processed) Serialized model
  • 54. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 30 Test Data Predictions Evaluation Statistics Training Data Frame (Processed) Test Data Frame (Processed) Serialized model 5.Report • Determine what report sections should be included • Merge pre-existing section templates (.ipynb files) • Dynamically Run final .ipynb file (via ExecutePreprocessor and environment variables) • Convert report to HTML using HTMLExporter jupyter + seaborn + pandas
  • 55. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 30 Test Data Predictions Evaluation Statistics Training Data Frame (Processed) Test Data Frame (Processed) Serialized model Final Report (.ipynb) Final Report (html) 5.Report • Determine what report sections should be included • Merge pre-existing section templates (.ipynb files) • Dynamically Run final .ipynb file (via ExecutePreprocessor and environment variables) • Convert report to HTML using HTMLExporter jupyter + seaborn + pandas
  • 56. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Demo 31
  • 57. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Summary • Machine learning in high-stakes educational assessment requires additional number crunching to verify accuracy and interpretability. • Need a pipeline to compare a large number of research experiments using a standardized, easy-to-read report. • The scientific Python stack makes it super easy to implement all stages of the pipeline! • In progress • Release under open-source license (2016 release) • A CherryPy/JS web-app to allow wider reach 32
  • 58. Copyright © 2015 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 30141 Questions? 33 https://github.com/EducationalTestingService https://github.com/desilinguist @haikuman