These slides present an automatic system used for the evaluation of Bachelor and Master thesis of Computer Science students. In order to be able to fulfill this task, we have used text complexity measures along with other factors to evaluate the students' thesis. Text complexity has been mainly used to predict the grade level for which a specific reading passage or text should be assigned to. Also, it has been used in evaluating students' writings in language classes. We have decided to try to use text complexity measures for evaluating students' graduation thesis. The main challenges of this task are to select the best features that accurately reflect student's performance in a specific domain, and to identify the optimal classifier to predict the student's score. Firstly, we investigated four sets of text complexity measures (lexical, syntactic, semantic, and character measures), cohesion metrics and a couple of features related to the thesis organization and to the references and bibliography. Secondly, we computed the correlation between the proposed features and we excluded the highly inter-correlated ones. After that, we used several classifiers to predict the students' grade levels and to compare their performances. Finally, we tested our work on a corpus of Bachelor and Master thesis from the students of the Computer Science Department of the University Politehnica of Bucharest that were written in English (as for English there is a high availability of open-source tools for natural language processing). We evaluated the quality of the presented application using Pearson's Rank Correlation to compare our results with the students' grades assigned by the evaluation committee for their thesis
2. Overview
• Introduction
• Motivation
• Previous work
• System architecture
• Dataset
• Results
• Conclusions
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 2
3. Introduction
• Using natural language processing (NLP) and
machine learning for automated analysis of
written texts (essays, books, thesis) in
e-learning
• Essay grading
• Text complexity
• Assessment of conversations
• Authorship identification
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 3
4. Motivation
• Are features used in essay grading and/or text
complexity assessment suitable for automatic
grading of BSc and MSc diploma theses in
computer science?
• Which is the most accurate classifier for
grading theses?
• What problems are encountered?
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 4
5. Previous work
• Textual complexity features computed on distinct
levels:
– Character measures
– Lexical measures
– Syntactic measures
– Semantic measures
– Coherence measures
• Text complexity measures can help in grading
students' essays
• Assessing the text complexity can also provide a good
indicator for assigning reading passages to students in
different grade levels (predicting the correct grade
level of each reading passage)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 5
7. Features
• Lexical Features – lexical measures based on sentences and words
– sentence length
– word length
– vocabulary richness
– hapax legomenon (the number of words mentioned once)
– functional words
– frequent words, frequent word n-grams, frequent acronyms
– number of constituent paragraphs
• Character Features
– character n-grams
– punctuation marks count
– letter count
– ratio of upper case to lower case characters
– ratio of digits to alphabetical characters
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 7
8. Features
• WordNet Features:
– depth of proper nouns mentioned in the text
– and the average length of the hypernym path for nouns, verbs, and
noun and verbs altogether
• Syntactic Features:
– frequent POS tags, frequent n-grams of POS tags
– named entities
– properties of the syntactic parse tree (average branching factor,
average height) of each sentence
• Cohesion Features:
– noun overlap, argument overlap, stem overlap, content word overlap
– noun phrase density
– personal pronoun incidence scores
– polysemy for words
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 8
9. Dataset
• BSc and Msc diploma theses from the Department of
Computer Science within University Politehnica of
Bucharest
• 361 BSC and 202 MSc = 563 theses written in English
during the last 4 years
• After removing duplicates and thesis that did not have
a student name (or the name was not discovered
automatically), our dataset comprised of 437 instances
• Matching student data from thesis with student data
from the grade database (approximate string matching
using student name + thesis name + year of
graduation)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 9
10. Dataset
• Distribution of grades is very unbalanced
• Dataset is also affected by some human errors
/ outliers (the grades below 5)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 10
11. Results
• Several classifiers have been trained:
– k-NN (with k=10)
– Neural network (NN)
– Support vector machine (SVM)
– Random Forest (RF)
• Used 3-fold cross validation, keeping 2/3 of
the data for training and 1/3 for the test set
• Performance assessed using:
– Mean squared error (MSE)
– Pearson correlation (with p-values)
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 11
12. Results
Method MSE P-value Correlation
SVM
(classification)
0.447 0.068 0.151
k-NN
(classification)
random 0.987 -0.001
NN
(regression)
random 0.312 -0.040
RF
(classification)
0.368 0 0.388
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 12
13. Results
• Random Forest classifier had the best results
(MSE=0.368, r=0.388, p-value=0)
• SVM has poorer results, k-NN and NN (regression)
have not achieved any useful results
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 13
14. Conclusions
• Linguistic textual complexity features provide a
low accuracy for thesis grading on our dataset
• Three main reasons:
– Dataset: most of the grades assigned by the
evaluation committee ranged from 9 to 10
• Usually only the best Romanian students write their
graduation thesis in English
– Task: difficulty of finding the best features for
assessing the scientific content of the thesis
– Grading process: the methodology used by the
evaluation committee when grading the thesis, which
does not always judge only the quality of the thesis,
but also uses information about the student’s GPA
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 14
15. Improvements
• Feature selection and post-processing
• Retrain the classifiers using a subset of features
with the strongest prediction power
• Find other measures that can evaluate the
scientific content of the thesis
• Semantic features that could capture the level of
knowledge
• Should be able to predict the main field of a given
thesis and to evaluate the thesis considering the
context of that specific field
26.02.19 K-Teams @ eLSE 2014 – Bucharest, Romania 15