SlideShare a Scribd company logo
1 of 16
Authors
University
Politehnica
of Bucharest
Unsupervised System for Automatic
Grading of Bachelor and Master Thesis
Yusra Mosallam
Lukas Toma
Mulu Weldegebreal Adhana
Costin-Gabriel Chiru
Traian Rebedea traian.rebedea@cs.pub.ro
Overview
• Introduction
• Motivation
• Previous work
• System architecture
• Dataset
• Results
• Conclusions
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 2
Introduction
• Using natural language processing (NLP) and
machine learning for automated analysis of
written texts (essays, books, thesis) in
e-learning
• Essay grading
• Text complexity
• Assessment of conversations
• Authorship identification
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 3
Motivation
• Are features used in essay grading and/or text
complexity assessment suitable for automatic
grading of BSc and MSc diploma theses in
computer science?
• Which is the most accurate classifier for
grading theses?
• What problems are encountered?
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 4
Previous work
• Textual complexity features computed on distinct
levels:
– Character measures
– Lexical measures
– Syntactic measures
– Semantic measures
– Coherence measures
• Text complexity measures can help in grading
students' essays
• Assessing the text complexity can also provide a good
indicator for assigning reading passages to students in
different grade levels (predicting the correct grade
level of each reading passage)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 5
System architecture
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 6
Features
• Lexical Features – lexical measures based on sentences and words
– sentence length
– word length
– vocabulary richness
– hapax legomenon (the number of words mentioned once)
– functional words
– frequent words, frequent word n-grams, frequent acronyms
– number of constituent paragraphs
• Character Features
– character n-grams
– punctuation marks count
– letter count
– ratio of upper case to lower case characters
– ratio of digits to alphabetical characters
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 7
Features
• WordNet Features:
– depth of proper nouns mentioned in the text
– and the average length of the hypernym path for nouns, verbs, and
noun and verbs altogether
• Syntactic Features:
– frequent POS tags, frequent n-grams of POS tags
– named entities
– properties of the syntactic parse tree (average branching factor,
average height) of each sentence
• Cohesion Features:
– noun overlap, argument overlap, stem overlap, content word overlap
– noun phrase density
– personal pronoun incidence scores
– polysemy for words
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 8
Dataset
• BSc and Msc diploma theses from the Department of
Computer Science within University Politehnica of
Bucharest
• 361 BSC and 202 MSc = 563 theses written in English
during the last 4 years
• After removing duplicates and thesis that did not have
a student name (or the name was not discovered
automatically), our dataset comprised of 437 instances
• Matching student data from thesis with student data
from the grade database (approximate string matching
using student name + thesis name + year of
graduation)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 9
Dataset
• Distribution of grades is very unbalanced
• Dataset is also affected by some human errors
/ outliers (the grades below 5)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 10
Results
• Several classifiers have been trained:
– k-NN (with k=10)
– Neural network (NN)
– Support vector machine (SVM)
– Random Forest (RF)
• Used 3-fold cross validation, keeping 2/3 of
the data for training and 1/3 for the test set
• Performance assessed using:
– Mean squared error (MSE)
– Pearson correlation (with p-values)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 11
Results
Method MSE P-value Correlation
SVM
(classification)
0.447 0.068 0.151
k-NN
(classification)
random 0.987 -0.001
NN
(regression)
random 0.312 -0.040
RF
(classification)
0.368 0 0.388
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 12
Results
• Random Forest classifier had the best results
(MSE=0.368, r=0.388, p-value=0)
• SVM has poorer results, k-NN and NN (regression)
have not achieved any useful results
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 13
Conclusions
• Linguistic textual complexity features provide a
low accuracy for thesis grading on our dataset
• Three main reasons:
– Dataset: most of the grades assigned by the
evaluation committee ranged from 9 to 10
• Usually only the best Romanian students write their
graduation thesis in English
– Task: difficulty of finding the best features for
assessing the scientific content of the thesis
– Grading process: the methodology used by the
evaluation committee when grading the thesis, which
does not always judge only the quality of the thesis,
but also uses information about the student’s GPA
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 14
Improvements
• Feature selection and post-processing
• Retrain the classifiers using a subset of features
with the strongest prediction power
• Find other measures that can evaluate the
scientific content of the thesis
• Semantic features that could capture the level of
knowledge
• Should be able to predict the main field of a given
thesis and to evaluate the thesis considering the
context of that specific field
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 15
Thank you!
Questions?
Discussion
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 16

More Related Content

Similar to Unsupervised system for automatic grading of bachelor and master thesis

Evidence-Centered Approach to Online Assessment of Students’ Digital Competence
Evidence-Centered Approach to Online Assessment of Students’ Digital CompetenceEvidence-Centered Approach to Online Assessment of Students’ Digital Competence
Evidence-Centered Approach to Online Assessment of Students’ Digital CompetenceMart Laanpere
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)Nicolas Van Labeke
 
Towards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science CoursesTowards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science CoursesManas Tungare
 
lucas huebner resume 6.22.15 pdf
lucas huebner resume 6.22.15 pdflucas huebner resume 6.22.15 pdf
lucas huebner resume 6.22.15 pdfLucas Huebner
 
Oral Defense presentation
Oral Defense presentationOral Defense presentation
Oral Defense presentationDwayne Squires
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...EDEN Digital Learning Europe
 
Computer Science Education: Tools and Data
Computer Science Education: Tools and DataComputer Science Education: Tools and Data
Computer Science Education: Tools and DataPeter Brusilovsky
 
Teachers' perceptions on the integration of ethnomathematics surendra
Teachers' perceptions on the integration of ethnomathematics surendraTeachers' perceptions on the integration of ethnomathematics surendra
Teachers' perceptions on the integration of ethnomathematics surendraSurendra Kumar Thakur
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
 
Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzujerdeb
 
Preaty tefa 2013
Preaty tefa 2013Preaty tefa 2013
Preaty tefa 2013CoSyLlab
 
Resume(short)
Resume(short)Resume(short)
Resume(short)butest
 
Intro to CCSS - East China 2-
Intro to CCSS - East China 2-Intro to CCSS - East China 2-
Intro to CCSS - East China 2-Laura Chambless
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindisinghg77
 
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovskyifi8106tlu
 

Similar to Unsupervised system for automatic grading of bachelor and master thesis (20)

Evidence-Centered Approach to Online Assessment of Students’ Digital Competence
Evidence-Centered Approach to Online Assessment of Students’ Digital CompetenceEvidence-Centered Approach to Online Assessment of Students’ Digital Competence
Evidence-Centered Approach to Online Assessment of Students’ Digital Competence
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
 
Towards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science CoursesTowards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science Courses
 
lucas huebner resume 6.22.15 pdf
lucas huebner resume 6.22.15 pdflucas huebner resume 6.22.15 pdf
lucas huebner resume 6.22.15 pdf
 
00 syllabus
00 syllabus00 syllabus
00 syllabus
 
Oral Defense presentation
Oral Defense presentationOral Defense presentation
Oral Defense presentation
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...
Digital Skills Gap Peer Learning Activity - Analysis of usage of LMS activiti...
 
My Resume
My ResumeMy Resume
My Resume
 
Computer Science Education: Tools and Data
Computer Science Education: Tools and DataComputer Science Education: Tools and Data
Computer Science Education: Tools and Data
 
Teachers' perceptions on the integration of ethnomathematics surendra
Teachers' perceptions on the integration of ethnomathematics surendraTeachers' perceptions on the integration of ethnomathematics surendra
Teachers' perceptions on the integration of ethnomathematics surendra
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
 
Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
Preaty tefa 2013
Preaty tefa 2013Preaty tefa 2013
Preaty tefa 2013
 
The Methods To Support Ranking In Cross-Language Web Search.doc
The Methods To Support Ranking In Cross-Language Web Search.docThe Methods To Support Ranking In Cross-Language Web Search.doc
The Methods To Support Ranking In Cross-Language Web Search.doc
 
Xiaowei Wang CV
Xiaowei Wang CVXiaowei Wang CV
Xiaowei Wang CV
 
Resume(short)
Resume(short)Resume(short)
Resume(short)
 
Intro to CCSS - East China 2-
Intro to CCSS - East China 2-Intro to CCSS - East China 2-
Intro to CCSS - East China 2-
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
 
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
 

More from University Politehnica Bucharest

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisUniversity Politehnica Bucharest
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...University Politehnica Bucharest
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...University Politehnica Bucharest
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisUniversity Politehnica Bucharest
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...University Politehnica Bucharest
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...University Politehnica Bucharest
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileUniversity Politehnica Bucharest
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaUniversity Politehnica Bucharest
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyUniversity Politehnica Bucharest
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentareaUniversity Politehnica Bucharest
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsUniversity Politehnica Bucharest
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...University Politehnica Bucharest
 

More from University Politehnica Bucharest (20)

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case study
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
 
Creativity detection in texts
Creativity detection in textsCreativity detection in texts
Creativity detection in texts
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
 

Recently uploaded

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

Unsupervised system for automatic grading of bachelor and master thesis

  • 1. Authors University Politehnica of Bucharest Unsupervised System for Automatic Grading of Bachelor and Master Thesis Yusra Mosallam Lukas Toma Mulu Weldegebreal Adhana Costin-Gabriel Chiru Traian Rebedea traian.rebedea@cs.pub.ro
  • 2. Overview • Introduction • Motivation • Previous work • System architecture • Dataset • Results • Conclusions 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 2
  • 3. Introduction • Using natural language processing (NLP) and machine learning for automated analysis of written texts (essays, books, thesis) in e-learning • Essay grading • Text complexity • Assessment of conversations • Authorship identification 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 3
  • 4. Motivation • Are features used in essay grading and/or text complexity assessment suitable for automatic grading of BSc and MSc diploma theses in computer science? • Which is the most accurate classifier for grading theses? • What problems are encountered? 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 4
  • 5. Previous work • Textual complexity features computed on distinct levels: – Character measures – Lexical measures – Syntactic measures – Semantic measures – Coherence measures • Text complexity measures can help in grading students' essays • Assessing the text complexity can also provide a good indicator for assigning reading passages to students in different grade levels (predicting the correct grade level of each reading passage) 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 5
  • 6. System architecture 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 6
  • 7. Features • Lexical Features – lexical measures based on sentences and words – sentence length – word length – vocabulary richness – hapax legomenon (the number of words mentioned once) – functional words – frequent words, frequent word n-grams, frequent acronyms – number of constituent paragraphs • Character Features – character n-grams – punctuation marks count – letter count – ratio of upper case to lower case characters – ratio of digits to alphabetical characters 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 7
  • 8. Features • WordNet Features: – depth of proper nouns mentioned in the text – and the average length of the hypernym path for nouns, verbs, and noun and verbs altogether • Syntactic Features: – frequent POS tags, frequent n-grams of POS tags – named entities – properties of the syntactic parse tree (average branching factor, average height) of each sentence • Cohesion Features: – noun overlap, argument overlap, stem overlap, content word overlap – noun phrase density – personal pronoun incidence scores – polysemy for words 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 8
  • 9. Dataset • BSc and Msc diploma theses from the Department of Computer Science within University Politehnica of Bucharest • 361 BSC and 202 MSc = 563 theses written in English during the last 4 years • After removing duplicates and thesis that did not have a student name (or the name was not discovered automatically), our dataset comprised of 437 instances • Matching student data from thesis with student data from the grade database (approximate string matching using student name + thesis name + year of graduation) 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 9
  • 10. Dataset • Distribution of grades is very unbalanced • Dataset is also affected by some human errors / outliers (the grades below 5) 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 10
  • 11. Results • Several classifiers have been trained: – k-NN (with k=10) – Neural network (NN) – Support vector machine (SVM) – Random Forest (RF) • Used 3-fold cross validation, keeping 2/3 of the data for training and 1/3 for the test set • Performance assessed using: – Mean squared error (MSE) – Pearson correlation (with p-values) 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 11
  • 12. Results Method MSE P-value Correlation SVM (classification) 0.447 0.068 0.151 k-NN (classification) random 0.987 -0.001 NN (regression) random 0.312 -0.040 RF (classification) 0.368 0 0.388 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 12
  • 13. Results • Random Forest classifier had the best results (MSE=0.368, r=0.388, p-value=0) • SVM has poorer results, k-NN and NN (regression) have not achieved any useful results 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 13
  • 14. Conclusions • Linguistic textual complexity features provide a low accuracy for thesis grading on our dataset • Three main reasons: – Dataset: most of the grades assigned by the evaluation committee ranged from 9 to 10 • Usually only the best Romanian students write their graduation thesis in English – Task: difficulty of finding the best features for assessing the scientific content of the thesis – Grading process: the methodology used by the evaluation committee when grading the thesis, which does not always judge only the quality of the thesis, but also uses information about the student’s GPA 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 14
  • 15. Improvements • Feature selection and post-processing • Retrain the classifiers using a subset of features with the strongest prediction power • Find other measures that can evaluate the scientific content of the thesis • Semantic features that could capture the level of knowledge • Should be able to predict the main field of a given thesis and to evaluate the thesis considering the context of that specific field 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 15
  • 16. Thank you! Questions? Discussion 26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 16