NLP_Presentation

•

1 like•525 views

Aravind700

GROUP 2 Manali Shah
Aravind Ram Nathan
Ismail Enchikalathil Jelal
Ankita Tiwari
AUTOMATED SHORT
ANSWERS GRADING

GOAL
• Generation of computer learning model that can grade
short written responses.
• Advantages:
• Fairness
• Less human resource cost
• Timely feedback
Graded Short
Answers
NLP Features
Machine
Learning Model

RESOURCES
Dataset:
• Hewlett Foundation on Kaggle Data Platform.
• 17,000 short responses written by 10th
grade students.
• 10 different essay sets covering various topics ranging
from Science to Arts.
• Average length of response is 50 words.
• Training sets humanly graded and assigned a score
ranging from 0-3.
Technologies:
• Python: nltk, scikit-learn, pandas, pyplot, skll
• R: h2o

METHODOLOGY
Preprocessing
Feature Engineering,
Feature Selection and Model
Training
Final Model
Generation

PREPROCESSING
• Remove non printable characters from raw text.
• Convert to lowercase.
• Spelling correction using Peter Norvig’s spelling corrector
• POS Tagging using NLTK pos_tag() function on corrected
text
• Extraction of numbers using regex.
• Remove stop words.
• Stemming using NLTK porter_stemmer() function.

FEATURE ENGINEERING
• Term usage:
• Statistics of various kinds of part of speech
• Statistics of length of words
• Spelling errors
• Sentence Quality:
• Grammar errors using 3gram, 4gram and 5gram dictionaries
• Bag of words:
• Top 10 most occurring unigrams from training set
• Content Fluency and Richness:
• Finding cosine similarity degree with essay scored 0-3
calculated from TF*IDF
• Essay length

FEATURE SELECTION
• Remove features that has little effect on the output.
• Large number of features
• Induce greater computational cost
• May lead to overfitting
• Sequential Forward Selection (SFS) algorithm
• Goodness of feature measured by kappa score.
• kappa score measures the inter-rater agreement
between two raters.

MACHINE LEARNING ALGORITHMS USED
FOR BUILDING THE MODEL
• K-Nearest Neighbors
• Naive Bayes
• Decision Tree
• Support Vector Machines (SVM)
• Gradient Boosting
• Deep Learning
• Random Forest
• Ensemble of all the above algorithms

CROSS VALIDATION
• 5-fold cross validation is used for choosing
hyperparameters in machine learning algorithms
• Hyperparameters :
• K-NN - 3 Neighbors
• Random Forest - 50 trees
• Gradient Boosting Machine - 200 trees
• Deep Learning - 3 layered network with 50 units in each layer

Evaluation metric -Quadratic Weighted KappaK-NN
Naive
Bayes
Decision
Tree SVM
Gradient
Boosting
Random
Forest
Deep
Learning
Ensemble
Kappa
Value
Essay Set 1 0.4423 0.5262 0.5533 0.2790 0.6321 0.6180 0.6463 0.7121
Essay Set 2 0.2720 0.5062 0.4416 0.4068 0.5540 0.5230 0.5552 0.5809
Essay Set 3 0.1071 0.5093 0.2320 0.3596 0.2868 0.2314 0.3751 0.4144
Essay Set 4 0.3449 0.6003 0.4515 0.4742 0.6495 0.5895 0.5453 0.6626
Essay Set 5 0.5001 0.5997 0.6300 0.6088 0.7046 0.7487 0.7372 0.6996
Essay Set 6 0.3298 0.6121 0.6661 0.6893 0.7510 0.6971 0.7366 0.7458
Essay Set 7 0.1735 0.3296 0.3519 0.3801 0.3904 0.4160 0.4364 0.4208
Essay Set 8 0.2908 0.4255 0.3281 0.4887 0.4728 0.4772 0.4854 0.4549
Essay Set 9 0.5589 0.7312 0.6305 0.7360 0.7432 0.7792 0.7515 0.7508
Essay Set 10 0.5380 0.6187 0.4899 0.6566 0.6575 0.6723 0.6517 0.6652

Viewers also liked

currentShubham Jadhav

GA: Creating Backyard Wildlife HabitatSotirakou964

Tech Talk #2: Playing with tons of web content aka NLP in examplesNexus FrontierTech

NLP Project PresentationAryak Sengupta

Nlp @ work presentation nik green - outdoor educationevolutionpd

Deep Learning Automated HelpdeskPranav Sharma

Practical Natural Language ProcessingJaganadh Gopinadhan

Natural Language ProcessingJaganadh Gopinadhan

Viewers also liked (8)

current

GA: Creating Backyard Wildlife Habitat

Tech Talk #2: Playing with tons of web content aka NLP in examples

NLP Project Presentation

Nlp @ work presentation nik green - outdoor education

Deep Learning Automated Helpdesk

Practical Natural Language Processing

Natural Language Processing

Similar to NLP_Presentation

Apache MXNet ODSC West 2018Apache MXNet

AI powered emotion recognition: From Inception to Production - Global AI Conf...Vandana Kannan

AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet

DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018Apache MXNet

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Soheila Dehghanzadeh

Machine LearningGirish Khanzode

NLP and Deep Learning for non_expertsSanghamitra Deb

Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDatabricks

Deep Learning Models for Question AnsweringSujit Pal

A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services

NLP Classifier Models & MetricsSanghamitra Deb

Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan

background.pptxKabileshCm

presentation.pptMadhuriChandanbatwe

Machine Learning for EveryoneAly Abdelkareem

Guiding through a typical Machine Learning PipelineMichael Gerke

05 k-means clusteringSubhas Kumar Ghosh

Machine Duping 101: Pwning Deep Learning SystemsClarence Chio

Studies of HPCC Systems from Machine Learning PerspectivesHPCC Systems

Similar to NLP_Presentation (20)

Apache MXNet ODSC West 2018

AI powered emotion recognition: From Inception to Production - Global AI Conf...

DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...

Machine Learning

NLP and Deep Learning for non_experts

Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter

Deep Learning Models for Question Answering

A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks

NLP Classifier Models & Metrics

Predicting SPARQL query execution time and suggesting SPARQL queries based on...

background.pptx

presentation.ppt

Machine Learning for Everyone

Guiding through a typical Machine Learning Pipeline

05 k-means clustering

Machine Duping 101: Pwning Deep Learning Systems

Studies of HPCC Systems from Machine Learning Perspectives

NLP_Presentation

1. GROUP 2 Manali Shah Aravind Ram Nathan Ismail Enchikalathil Jelal Ankita Tiwari AUTOMATED SHORT ANSWERS GRADING

2. GOAL • Generation of computer learning model that can grade short written responses. • Advantages: • Fairness • Less human resource cost • Timely feedback Graded Short Answers NLP Features Machine Learning Model

3. RESOURCES Dataset: • Hewlett Foundation on Kaggle Data Platform. • 17,000 short responses written by 10th grade students. • 10 different essay sets covering various topics ranging from Science to Arts. • Average length of response is 50 words. • Training sets humanly graded and assigned a score ranging from 0-3. Technologies: • Python: nltk, scikit-learn, pandas, pyplot, skll • R: h2o

4. METHODOLOGY Preprocessing Feature Engineering, Feature Selection and Model Training Final Model Generation

5. PREPROCESSING • Remove non printable characters from raw text. • Convert to lowercase. • Spelling correction using Peter Norvig’s spelling corrector • POS Tagging using NLTK pos_tag() function on corrected text • Extraction of numbers using regex. • Remove stop words. • Stemming using NLTK porter_stemmer() function.

6. FEATURE ENGINEERING • Term usage: • Statistics of various kinds of part of speech • Statistics of length of words • Spelling errors • Sentence Quality: • Grammar errors using 3gram, 4gram and 5gram dictionaries • Bag of words: • Top 10 most occurring unigrams from training set • Content Fluency and Richness: • Finding cosine similarity degree with essay scored 0-3 calculated from TF*IDF • Essay length

7. FEATURE SELECTION • Remove features that has little effect on the output. • Large number of features • Induce greater computational cost • May lead to overfitting • Sequential Forward Selection (SFS) algorithm • Goodness of feature measured by kappa score. • kappa score measures the inter-rater agreement between two raters.

8. MACHINE LEARNING ALGORITHMS USED FOR BUILDING THE MODEL • K-Nearest Neighbors • Naive Bayes • Decision Tree • Support Vector Machines (SVM) • Gradient Boosting • Deep Learning • Random Forest • Ensemble of all the above algorithms

9. CROSS VALIDATION • 5-fold cross validation is used for choosing hyperparameters in machine learning algorithms • Hyperparameters : • K-NN - 3 Neighbors • Random Forest - 50 trees • Gradient Boosting Machine - 200 trees • Deep Learning - 3 layered network with 50 units in each layer

10. Visualizations

11.

12. Evaluation metric -Quadratic Weighted KappaK-NN Naive Bayes Decision Tree SVM Gradient Boosting Random Forest Deep Learning Ensemble Kappa Value Essay Set 1 0.4423 0.5262 0.5533 0.2790 0.6321 0.6180 0.6463 0.7121 Essay Set 2 0.2720 0.5062 0.4416 0.4068 0.5540 0.5230 0.5552 0.5809 Essay Set 3 0.1071 0.5093 0.2320 0.3596 0.2868 0.2314 0.3751 0.4144 Essay Set 4 0.3449 0.6003 0.4515 0.4742 0.6495 0.5895 0.5453 0.6626 Essay Set 5 0.5001 0.5997 0.6300 0.6088 0.7046 0.7487 0.7372 0.6996 Essay Set 6 0.3298 0.6121 0.6661 0.6893 0.7510 0.6971 0.7366 0.7458 Essay Set 7 0.1735 0.3296 0.3519 0.3801 0.3904 0.4160 0.4364 0.4208 Essay Set 8 0.2908 0.4255 0.3281 0.4887 0.4728 0.4772 0.4854 0.4549 Essay Set 9 0.5589 0.7312 0.6305 0.7360 0.7432 0.7792 0.7515 0.7508 Essay Set 10 0.5380 0.6187 0.4899 0.6566 0.6575 0.6723 0.6517 0.6652

13. Thank You!

NLP_Presentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to NLP_Presentation

Similar to NLP_Presentation (20)

NLP_Presentation