Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Essay Scoring Tool (TEST) for Hindi

2,067 views

Published on

First prize for Engineering Design at ISTE 2011

Published in: Technology
  • Be the first to comment

The Essay Scoring Tool (TEST) for Hindi

  1. 1. The Essay Scoring Tool - TEST B.E Project presentation Submitted by: Abhinav Gupta 201/CO/03 Danish Contractor 233/CO/03 Gaurav Singh 238/CO/03 Himanshu Mehrotra 241/CO/03 Under the guidance of: Dr. Shampa Chakraverty COE Dept. NSIT Date of presentation: 1 st June 2007 NSIT, Delhi
  2. 2. PRIOR WORK NSIT, Delhi
  3. 3. Overview of the Software NSIT, Delhi Student Essay TEST Essay TEST Training Essays INPUTS Spelling & Grammatical Checks Corpus Facts Feedback to student Score OUTPUTS
  4. 4. Scoring Parameters NSIT, Delhi Scoring Engine Quality of Content Global Coherence Factual Accuracy Local Coherence
  5. 5. SINGULAR VALUES (K) RETAINED NSIT, Delhi
  6. 6. Study Undertaken <ul><li>Set of essays given to Human Graders </li></ul><ul><li>Essays rated as : </li></ul><ul><ul><li>Good Essays </li></ul></ul><ul><ul><li>Bad Essays </li></ul></ul>
  7. 7. LOCAL COHERENCE – Good Essays Average variance from gold standard - 0.0219 NSIT, Delhi
  8. 8. LOCAL COHERENCE – Other Essays NSIT, Delhi Average variance from gold standard - 0.212
  9. 9. LOCAL COHERENCE- Combined Essays NSIT, Delhi Series 1 : Good essays Series 2 : Other Essays
  10. 10. LOCAL COHERENCE - MARKING SCHEME NSIT, Delhi
  11. 11. LOCAL COHERENCE - MARKS NSIT, Delhi
  12. 12. CONTENTS-ESSAYS TO BE MARKED NSIT, Delhi
  13. 13. CONTENT – Good Essays NSIT, Delhi
  14. 14. CONTENT – Other Essays NSIT, Delhi
  15. 15. CONTENT - COMBINED SERIES 1 : GOOD ESSAYS SERIES 5: OTHER ESSAYS NSIT, Delhi
  16. 16. CONTENT-NORMALIZED MARKS NSIT, Delhi
  17. 17. GLOBAL COHERENCE <ul><li>Essays are classified as having a : </li></ul><ul><li>Good Structure </li></ul><ul><li>Average Structure </li></ul><ul><li>Bad Structure </li></ul>NSIT, Delhi
  18. 18. GOOD STRUCTURED ESSAY NSIT, Delhi
  19. 19. AVERAGELY STRUCTURED ESSAY NSIT, Delhi
  20. 20. BADLY STRUCTURED ESSAY NSIT, Delhi
  21. 21. GLOBAL COHERENCE MARKS NSIT, Delhi
  22. 22. Fact Evaluation Module NSIT, Delhi TEST Fact Evaluation Module Topic Specific Keywords List of Essays Correct Facts List Incorrect Facts List Individual Essay Reports & Scores N X 1 Score Matrix (For Internal use by TEST)
  23. 23. Fact Evaluation No. of facts matched:4 No. of Incorrect Facts matched:1 SCORE: 0.8 NSIT, Delhi
  24. 24. Breakup of Essay Scores NSIT, Delhi
  25. 25. Human scores v/s TEST scores NSIT, Delhi
  26. 26. Performance of TEST <ul><li>Adjacent agreement with human graders around 77% </li></ul><ul><li>Agreement among human graders around 73% </li></ul>NSIT, Delhi
  27. 27. TIME COMPLEXITY <ul><li>PRE-PROCESSING FOR GLOBAL COHERENCE </li></ul><ul><li>0(N^3), Where N = No. of sentences in corpus. </li></ul><ul><li>O(t*n^2), t=no. of themes, n=no. of sentences in eval. Essay </li></ul><ul><li>FACT MODULE – O(k^4) </li></ul><ul><li>k=no. of keywords </li></ul>
  28. 28. COMPARISON OF TEST WITH OTHER AES TOOLS PEG IEA E-Rater TEST Evaluation parameters Essay length, Complexity of sentence and word length Similarity with gold standard Lexical complexity, Vocabulary, Essay organization and many more.. Similarity with gold standard, Essay organization,Fact Accuracy. Feedback No Yes Yes Yes Essay content checking No Yes Yes Yes Fact checking No No Yes Yes Training phase Time consuming & inexpensive Time consuming & inexpensive Time consuming & expensive Time consuming & inexpensive Language of essays English English English Hindi Performance Correlation of 0.87 with human raters Correlation of 0.85 with human raters. Correlation of 0.87 with human raters. Correlation of 0.7652 with human raters.
  29. 29. NSIT, Delhi FUTURE WORK <ul><li>Include OCR (Optical Character Recognition). </li></ul><ul><li>Increasing the size and variety of the corpus. </li></ul><ul><li>Incorporating modules for spelling and grammar evaluation. </li></ul><ul><li>The use of Random Indexing (RI) techniques can reduce the size of the matrix which is input for the SVD procedure and thus can reduce time-complexity. </li></ul>
  30. 30. LIMITATIONS <ul><li>Absence of grammatical checking </li></ul><ul><li>Absence of a spell-check </li></ul><ul><li>The tool is unable to check individualistic styles of writing </li></ul><ul><li>Domain – specific knowledge required before checking an essay </li></ul>NSIT, Delhi
  31. 31. CONTRIBUTION <ul><li>First AES tool for Hindi </li></ul><ul><li>Local Coherence at granularity of sentences </li></ul><ul><li>Good correlation with human raters </li></ul><ul><li>SVD done only once for Local and Global Coherence </li></ul>NSIT, Delhi
  32. 32. References <ul><li>An Introduction to Latent Semantic Analysis by Thomas K Landauer University of Colorado at Boulder, Peter W. Foltz, Department of Psychology, New Mexico State University, Darrell Laham, Department of Psychology University of Colorado at Boulder, Discourse Processes, 1998 </li></ul><ul><li>2. The Measurement of Textual Coherence with Latent Semantic Analysis by Peter W. Foltz, New Mexico State University Walter Kintsch and Thomas K. Landauer University of Colorado, Discourse Processes, 1998 </li></ul><ul><li>  3. Indexing by Latent Semantic Analysis by Scott Deerwester, Graduate Library School University of Chicago, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Bell Communications Research Richard Harshman, University of Western Ontario, Journal of the American Society for Information Science, 1990 </li></ul><ul><li>  4. On the notions of theme and topic in psychological process models of text comprehension by Walter Kintsch, Department of Psychology, University of Colorado, Interdisciplinary Studies, 2002 </li></ul><ul><li>  5. How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans by Thomas K. Landauer, Darrell Laham, Bob Rehder, and M. E. Schreiner Department of Psychology & Institute of Cognitive Science University of Colorado, Boulder, corpus, 1996 </li></ul><ul><li>  6. A Critiquing System to Support English Composition through the Use of Latent Semantic Analysis by Kelvin C. Wong, Anders I. Mørch, William K. Cheung, Mason H. Lam1 and Janti P. Tang, Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) pp. 576-581 </li></ul><ul><li>  7. Finding the WRITE stuff: Automatic identification of discourse structure in student essays by Jill Burstein, Daniel Marcu, and Kevin Knight. 2003b IEEE Trans-actions on Intelligent Systems: Special Issue on Ad-vances in Natural Language Processin g, 181:32–39.   </li></ul>NSIT, Delhi
  33. 33. WE WOULD LIKE TO THANK <ul><li>Dr. Shampa Chakraverty, without her constant guidance and support we would have given up long ago. </li></ul><ul><li>Dr. Niladri Chatterjee, Dept. of Mathematics, IIT Delhi for sharing his experience in the NLP field. </li></ul><ul><li>Ms. Yasmin Contractor, Principal, Summerfields School, Gurgaon for providing us with the student essays. </li></ul><ul><li>Faculty of COE Dept. and fellow students. </li></ul>NSIT, Delhi
  34. 34. Q & A ? NSIT, Delhi
  35. 35. Automatic Essay Evaluation Software <ul><li>B.E. Final Year Project : Final Evaluation </li></ul><ul><li>Project Guide: Dr. Shampa Chakraverty </li></ul><ul><li>Team: </li></ul><ul><li>Abhinav Gupta 201/CO/03 </li></ul><ul><li>Danish Contractor 233/CO/03 </li></ul><ul><li> Gaurav Singh 238/CO/03 </li></ul><ul><li>Himanshu Mehrotra 241/CO/03 </li></ul>
  36. 36. Aim of the software <ul><li>To score students’ essays on a specific topic. </li></ul><ul><li>Give feedback to the student on deficiencies in his/her essay. </li></ul>
  37. 37. Need for this software <ul><li>Teachers these days are overburdened with the evaluation of answer scripts. </li></ul><ul><li>Teachers are unable to give personalized attention to the students’ needs. </li></ul><ul><li>Students feel the need to practice writing essays in a non-test environment. </li></ul><ul><li>Many factors influence the scoring of essays and introduce error. </li></ul>
  38. 38. Overview of the Software
  39. 39. Parameters used for evaluation <ul><li>Similarity with the gold standard </li></ul><ul><li>Local coherence of essay </li></ul><ul><li>Global and Theme coherence checker and Feedback generator. </li></ul><ul><li>Fact checking </li></ul>
  40. 40. Latent Semantic Analysis (LSA) <ul><li>Latent semantic analysis is a statistical technique in natural language processing of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. </li></ul><ul><li>LSA derives a high-dimensional semantic space. Words and passages are represented as vectors in the space. </li></ul><ul><li>The LSA measured similarities have been shown to closely mimic human judgments of meaning similarity. </li></ul>
  41. 41.   Training corpus of gold standard essay and other articles, essays on the same topic + Essay under evaluation     Term-document matrix (M) (After Singular-value decomposition)   Three matrices – T,S and D (T=Term matrix, S=Singular-values matrix and D=document matrix)       Dimensionality reduction and preserving only 2 largest dimensions in S gives S-improved (Multiplying T, S-improved and D)   New Term by Document matrix       LSA: Steps involved
  42. 42. LSA Example Titles of Some Technical Memos • c1: Human machine interface for ABC computer applications • c2: A survey of user opinion of computer system response time • c3: The EPS user interface management system • c4: System and human system engineering testing of EPS • c5: Relation of user perceived response time to error measurement • m1: The generation of random, binary, ordered trees • m2: The intersection graph of paths in trees • m3: Graph minors IV: Widths of trees and well- quasi- ordering • m4: Graph mino rs : A survey
  43. 43. LSA Example : Term by document matrix
  44. 44. LSA Example: After SVD
  45. 45. LSA Example: Results   Similarity between documents: C1 and C2 = 0.91 (high) C1 and C3 = 1.00 (very-high) C1 with C5 = 0.85(high) C2 with C3 = 0.91 (high) C1 and M1 = -0.85 (low) M1 and M2 = 1.00 (very-high) M2 and M3 = 1.00 (very-high) C2 and C3 = 0.91 (high) 
  46. 46. Local Coherence Estimation What is Coherence? Each sentence in an essay is connected to previous sentences. The degree of this connection measures the coherence of the sentence pairs. Coherence estimation using LSA: By comparing vectors for two adjoining segments of text in a semantic space, LSA measures degree of semantic relatedness between the segments.  
  47. 47. Global and theme coherence checker and feedback generator <ul><li>The global structure of the essay is as follows: </li></ul><ul><li>  Introduction      </li></ul><ul><li>   Ideas in individual paragraphs       </li></ul><ul><li>   Conclusion </li></ul><ul><li>  </li></ul><ul><li>Ideas in an essay are presented in the following way: </li></ul><ul><li>Main idea </li></ul><ul><li>Supporting idea </li></ul><ul><li>Explanation of 1. and 2 </li></ul>
  48. 48. Global and theme coherence checker and feedback generator <ul><li>A set of possible introductions, conclusions and ideas are extracted from gold standard and other training essays. </li></ul><ul><li>The similarity of student essay introduction is measured against the set of introductions using LSA. The same is done for the ideas and conclusions. </li></ul><ul><li>Using the similarity measures the presence or absence of ideas, introductions and conclusions can be determined. </li></ul>
  49. 49. Fact Evaluation <ul><li>To facilitate this we will have 2 sets of facts –Correct fact and incorrect facts, per essay topic. </li></ul><ul><li>The following guidelines would be used to evaluate facts: </li></ul><ul><ul><li>Set of “keywords&quot; to be checked at the sentential level in the text. </li></ul></ul><ul><ul><li>Detection of two or more keywords invokes the checking module </li></ul></ul><ul><ul><li>2 databases of facts (Correct and Incorrect) contain sets of keywords to form a &quot;fact&quot;. </li></ul></ul><ul><li>Each sentence would be assumed to have a maximum of one fact </li></ul><ul><li>Connectives in sentences to be treated as &quot;end-of-sentence&quot; markers for fact evaluation purposes. </li></ul>
  50. 50. Fact Evaluation <ul><li>The keywords detected are paired and matched to form sets of &quot;facts&quot; and then checked in the database. Three cases may arise: </li></ul><ul><ul><li>It returns a positive match in both databases. </li></ul></ul><ul><ul><li>It returns a positive match in the correct facts database. </li></ul></ul><ul><ul><li>It returns a positive match in the incorrect facts database </li></ul></ul><ul><li>The time complexity of factual evaluation is around O ( m* (log p)^2 ) </li></ul><ul><li>p= No of keywords </li></ul><ul><li>m= Average sentence length </li></ul><ul><li>This could be a huge overhead while evaluating essays as fact evaluation is a very small aspect of the entire process. </li></ul><ul><li>The use of SQL (for reading facts) and other database optimizations should reduce the time required during computation </li></ul>
  51. 51. References <ul><li>An Introduction to Latent Semantic Analysis by Thomas K Landauer University of Colorado at Boulder, Peter W. Foltz, Department of Psychology, New Mexico State University, Darrell Laham, Department of Psychology University of Colorado at Boulder, Discourse Processes, 1998 </li></ul><ul><li>2. The Measurement of Textual Coherence with Latent Semantic Analysis by Peter W. Foltz, New Mexico State University Walter Kintsch and Thomas K. Landauer University of Colorado, Discourse Processes, 1998 </li></ul><ul><li>  3. Indexing by Latent Semantic Analysis by Scott Deerwester, Graduate Library School University of Chicago, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Bell Communications Research Richard Harshman, University of Western Ontario, Journal of the American Society for Information Science, 1990 </li></ul><ul><li>  4. On the notions of theme and topic in psychological process models of text comprehension by Walter Kintsch, Department of Psychology, University of Colorado, Interdisciplinary Studies, 2002 </li></ul><ul><li>  5. How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans by Thomas K. Landauer, Darrell Laham, Bob Rehder, and M. E. Schreiner Department of Psychology & Institute of Cognitive Science University of Colorado, Boulder, corpus, 1996 </li></ul><ul><li>  6. A Critiquing System to Support English Composition through the Use of Latent Semantic Analysis by Kelvin C. Wong, Anders I. Mørch, William K. Cheung, Mason H. Lam1 and Janti P. Tang, Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) pp. 576-581 </li></ul><ul><li>  7. Finding the WRITE stuff: Automatic identification of discourse structure in student essays by Jill Burstein, Daniel Marcu, and Kevin Knight. 2003b IEEE Trans-actions on Intelligent Systems: Special Issue on Ad-vances in Natural Language Processin g, 181:32–39. </li></ul><ul><li>  </li></ul><ul><li>  </li></ul>
  52. 52. Local Coherence Module NSIT, Delhi The reduced term-document Matrix after LSA Evaluation Essay column number in term-document matrix Score on Local Coherence Feedback to Student Local Coherence Module
  53. 53. Local Coherence Results NSIT, Delhi
  54. 54. Content Evaluation Module NSIT, Delhi Set of Domain Specific Golden Standard Essays Set of Essays to be evaluated Essay Content Evaluation Module Normalized scores on basis of Content
  55. 55. Content Evaluation Results NSIT, Delhi
  56. 56. Content Evaluation Normalized Results NSIT, Delhi
  57. 57. Global Coherence Module NSIT, Delhi Golden Standard Essays Global Coherence Evaluation Module Feedback Score Evaluation Essay(s)
  58. 58. Global Coherence Evaluation Effect of K NSIT, Delhi

×