Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)

3,549 views

Published on

Slides of my presentation at DCLA13 (1st International Workshop on Discourse-Centric Learning Analytics) Leuven, April 8, 2013

Published in: Technology, Education
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)

  1. 1. OpenEssayist: Extractive Summarisationand Formative Assessmentof Free-Text EssaysNicolas Van Labeke, Denise Whitelock ,Debora Field , Stephen Pulman, John Richardson  Institute of Educational Technology – The Open University Department of Computer Science – University of Oxford
  2. 2. SAFeSEA: Research Questions• How can an automated system detect passageson which a human marker would usually givesome feedback ?• Can existing methods of information extraction,summarization be adapted to select content forsuch feedback ?• How effectively can these methods deliverfeedback ?• What effect does these techniques have on essayimprovement? On current essay and in futureones ? On self-regulation and metacognition ?
  3. 3. Context• Essays: Open University (UK) postgraduate assignments– Distance learning, adult learners– 1500+ words, free-text & open-ended questions• No “Gold Standard”, wide range of content– Perfect test ground for extractive techniques– Impact of lack of (or limited) domain knowledge?• Bulk of activity (i.e. writing) takes place outside system– Usage of drafts “varies a lot” among students– Nature, scope and timing of feedback?• Limited possibility for “mock” experiments:• testing & evaluation on “live” material• Connection with summative (tutor-based) assessment ?
  4. 4. Education Postgraduate Course H810Accessible online learning: supporting disabled studentsTMA1 (Tutor-Marked Assignment) – 1500 wordsWrite a report explaining the main accessibility challenges for disabled learnersthat you work with or support in your own work context(s).Critically evaluate the influence of the context (e.g. country, institution, perceivedrole of online learning within education) on the: (1) identified challenges; (2)influence of legislation; (3) roles and responsibilities of key individuals; (4) role ofassistive technologies in addressing these challenges.TMA2 – 3000 wordsCritically Evaluate your own learning resource in the following ways:1. Briefly describe the resource and its accessibility features.2. Evaluate the accessibility of your resource, identifying its strengths andweaknesses.3. Reflect on the processes of creating and evaluating accessible resources.
  5. 5. Context• Essays: Open University (UK) postgraduate assignments– Distance learning, adult learners– 1500+ words, free-text & open-ended questions• No “Gold Standard”, wide range of content– Perfect test ground for extractive techniques– Impact of lack of (or limited) domain knowledge?• Bulk of activity (i.e. writing) takes place outside system– Usage of drafts “varies a lot” among students– Nature, scope and timing of feedback?• Limited possibility for “mock” experiments:• testing & evaluation on “live” material• Connection with summative (tutor-based) assessment ?
  6. 6. openEssayistlocalhost:8065phaeros.open.ac.uk:80openEssayistPHP, Epiphany[Symfony2]UseropenEssayistRESTful APIPHP, EpiphanyUserUserpyEARESTful APIPython, Flasklocalhost:8064AfterTheDeadlineSpell/GrammarcheckerJavaUserlocalhost:9998Apache TikaText ExtractorJavaOrchestrator(Open)Learner ModelpyEssayAnalyserPython, NLTK
  7. 7. Extractive Summarisation• Hypothesis– quality and position of key phrases and key sentenceswithin an essay (i.e., relative to the position of itsstructural components) give idea of how complete andwell-structured the essay– provide a basis for building suitable models of feedback• Experimenting with two simpler summarisationstrategies– key phrase extraction : identifying individual words orshort phrases are the most suggestive of the content of adiscourse– extractive summarisation: identifying whole keysentences.• Rapid implementation and testing
  8. 8. Summarisation Processes1. NL pre-processing of text2. unsupervised recognition of structuralelements3. unsupervised extraction of keywords/phrases4. unsupervised extraction of key sentences.
  9. 9. Pre-processing• Using NLTK (Python-base Natural LanguageProcessing Toolkit)– tokenisers,– lemmatiser,– part-of-speech tagger,– List(s) of stop words.• Experimenting different approaches to definesuitable stop word list(s)– domain-independent list?– Generated from appropriate reference materials(using TF-IDF, for example)?
  10. 10. Essay Structure• Restructure text as paragraphs/sentences• Automatic Identification of each paragraph’s structural role– Summary, Introduction, conclusion, body, references, …– Regardless of presence of content-specific headings– No clues from formatting markup (plain text submission)• Decision trees developed through manual experimentation– corpus of 135 student essays submitted in previous years for thesame module that the evaluation will be carried out on.• Still need formal evaluation but output good enough forfirst rounds of OpenEssayist testing, and continuallyimproving
  11. 11. Key words, lemmas and phrases• Unsupervised extractive summarisation using graph-based rankingmethods (TextRank, Mihalcea & Tarau 2004, 2005)• Each unique word is represented by a node in the graph, and co-occurrence relations (specifically, within-sentence word adjacency) arerepresented by edges in the graph.• Compute a key-ness value for each word in the essay (Key-ness can beunderstood as significance within the context of the essay‘)• Centrality algorithm used to calculate the significance of each word– betweenness centrality (Freeman 1977) and PageRank (Brin & Page 1998)– Roughly speaking, a word with a high centrality score is a word that sitsadjacent to many other unique words which sit adjacent to many other uniquewords which…, and so on.• The words with high(est) centrality scores are the key words.– Decision needs to be made as to what proportion of the essays words qualifyas key words.• Sequences of keywords in the surface text identify within-sentence keyphrases (bigrams, trigrams and quadgrams).
  12. 12. Key words, lemmas and phrases
  13. 13. Key Sentences• Similar graph-based ranking approach used to computekey-ness scores for whole sentences.• Instead of word adjacency (as in the key word graph),co-occurrence of words across pairs of sentences is therelation used to construct the graph.– similarity measures of every pair of sentences.• The similarity scores become edge weights in thegraph, while whole sentences become the nodes.• TextRank key sentence algorithm (based on PageRankbut with added edge weights) is then applied.
  14. 14. Extractive Summarisation - Sentences
  15. 15. Extractive Summarisation – Overview
  16. 16. Exploring The Design Space❶Researcher-centred Design– Data-driven– Architecture setup, integration & refinement oftools– From discourse to summarisation– Emerging properties, hypotheses building
  17. 17. • Multiple External Representation• Mash-ups, reports, summaries, …• Highlighting co-occurrence of terms (or lack of)• Exploration & discovery, hypotheses building,eliciting recommendations & heuristics
  18. 18. Exploring The Design Space① Researcher-centred Design– Data-driven– Architecture setup, integration & refinement of tools– From discourse to summarisation– Emerging properties, hypotheses building❷ Learner-centred Design– Task-driven– Hypotheses testing & validation, refinement– From summarisation to formative feedback– Live evaluation
  19. 19. Question: What kind of feedback?
  20. 20. Section ofessayPurpose of sectionTitleWrite the full question (title) at thetop of your assignment. It willcontain keywords (known ascontent and process words). Seethe Understanding the questionwebpage for these.IntroductionA paragraph or two to define keyterms and themes and indicatehow you intend to address thequestion.Main bodyA series of paragraphs written infull sentences that include specificarguments relating to your answer.It’s vital to include evidence andreferences to support yourarguments.ConclusionsA short section to summarise mainpoints and findings. Try to focus onthe question but avoid repeatingwhat you wrote in theintroduction.ReferencesA list of sources (including modulematerials) that are mentioned inthe essay.• Introductions– An introduction provides your readerwith an overview of what your essaywill cover and what you want to say.– Essays introductions should• set out the aims of the assignment andsignpost how your argument will unfold• introduce the issue and give anyessential background informationincluding a brief description of themajor debates that lie behind thequestion• define the key words and terms• be between 5% and 10% of the totalword count– Some students prefer to write theintroduction at an early stage, otherssave it for when they have almostcompleted the assignment. If you writeit early, dont allow it to constrain whatyou want to write. Its a good idea tocheck and revise the introduction afterthe first draft.• The body of your essay– …Open University - Skills for OU Studyhttp://www.open.ac.uk/skillsforstudy/essays.php
  21. 21. Question: Reflective activities?• “Advice for action”– Expectation vs. intention– Reflection, self-report, validation of advices, …• Introducing user interventions in the system• Feeding back to the system? To the Essay Analyser?
  22. 22. Question: Drafts, History & Changes
  23. 23. Question: “Quality” of output?
  24. 24. Current and Future Work• Three lines of experimentations:– improve the different aspects of the essay analyser (e.g.different “key-ness” metrics, introduce domain-specificlists of stop-words)– Analyses of summarisation output (e.g. factor analysis) torun on existing corpus of essays• 5 years of essays on the H810 course, all marked and annotated byhuman tutors),• identify trends and markers to be used as progress/performanceindicators;– Iterative, user-centred, design and testing of openEssaysit(refine possible usage scenarios, test pedagogical)• Currently proceeding with second design phase• First live evaluation, in authentic context, by a newcohort of students on the H810 module (Sept 2013)

×