1st International Workshop on Discourse-Centric Learning Analytics
April 8, 2013, LAK13 Conference, Leuven, Belgium




XIP Dashboard: Visual Analytics
from Automated Rhetorical Parsing
of Scientific Metadiscourse

Duygu Simsek, Simon Buckingham Shum, Anna De Liddo,
Rebecca Ferguson — The Open University, UK

Ágnes Sándor — Xerox Research Centre Europe, FR
Metadiscourse

     Xerox Incremental Parser

Visual analytics v0.1: XIP Dashboard

   User Scenarios & Evaluation

                                   2
Metadiscourse signals important moves
in educated/scholarly narrative

 (When scholarly culture works well) this
  is what gets your papers accepted by
     reviewers, and quoted by others


               Clear statements regarding the
              problem, the claim, the argument,
               the evidence, the implications…


                                This is what we teach
                                students from school
                                       upwards
                                                        3
Rhetorical functions of metadiscourse identified
by the Xerox Incremental Parser (XIP)

BACKGROUND KNOWLEDGE:                  NOVELTY:                                    OPEN QUESTION:
Recent studies indicate …              ... new insights provide direct evidence ... … little is known …

… the previously proposed …            ... we suggest a new ... approach ...       … role … has been elusive
                                                                                   Current data is insufficient …
… is universally accepted ...          ... results define a novel role ...


SUMMARIZING:                           SIGNIFICANCE:                               CONTRASTING IDEAS:
The goal of this study ...             studies ... have provided important         … unorthodox view resolves …
                                       advances                                    paradoxes …
Here, we show ...
                                       Knowledge ... is crucial for ...            In contrast with previous
Altogether, our results ... indicate   understanding                               hypotheses ...
                                       valuable information ... from studies       ... inconsistent with past findings ...


GENERALIZING:                          SURPRISE:
... emerging as a promising approach   We have recently observed ...
                                       surprisingly
Our understanding ... has grown
exponentially ...                      We have identified ... unusual
... growing recognition of the         The recent discovery ... suggests
                                       intriguing roles
importance ...
Xerox Incremental Parser (XIP)




Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search.   5
Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
Xerox Incremental Parser (XIP)




Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search.   6
Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
Xerox Incremental Parser (XIP)




Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search.   7
Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
Xerox Incremental Parser (XIP)




Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search.   8
Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
Initial evaluation of XIP is promising,
but methodologically complex
A striking example – but not all were like this (De Liddo et al, 2012)

Human analyst                                    XIP




Extract from annotation comparison:
         Document 1     19 sentences annotated   22 sentences annotated
                                                 11 sentences same as human annotation
         Document 2     71 sentences annotated   59 sentences annotated
                                                 42 sentences same as human annotation
Xerox Incremental Parser (XIP)


                          XIP’s raw output is fine for NLP
                            machines/researchers, but
                         not learner/educator
                                friendly
Xerox Incremental Parser (XIP)


                          XIP’s raw output is fine for NLP
                            machines/researchers, but
                         not learner/educator
                                friendly
Xerox Incremental Parser (XIP)



                      5000 (or even 30) plain text files…
                       we need overviews
                      of XIP analyses from
                             a corpus
Making XIP analytics visible:
1. annotations on the full text using the OU’s
Cohere social sensemaking app (Firefox add-on)
Making XIP analytics visible:
2. XIP annotations visualized in Cohere as a network
around the document
Making XIP analytics visible (2)

2nd phase analysis
of document-concept clouds…

Connecting?
                              ?
Merging?
Re-tagging?    ?
Summarising?




                          ?        ?
XIP Dashboard: towards an earlier phase
dashboard for navigating XIP output

Draw attention to patterns of potential significance to
students, educators and experienced researchers alike:

§  the occurrence of domain concepts in different
    metadiscourse contexts – e.g. effective tutoring
    dialogue in sentences classified contrast

§  trends of the above over time, e.g. to show the
    development of an idea

§  trends within and differences between research
    communities as reflected in their publications

§  eventually, the above for one’s own writing
                                                          16
Paper prototype to elicit initial reactions




                                              17
Paper prototype to elicit initial reactions

                        ‘Intro movie’ from researcher

                        Participants point + click with
                                    finger

                         Basic navigation seems fine




               Enthusiasm for a tool that
               could help with literature
                       analysis

               Also for a tool to improve
             one’s own writing by showing
               trends, or inconsistencies
                                                          18
XIP Dashboard
Temporal trends per corpus




                             Similar patterns for LAK &
                                   EDM literatures

                               Summary & Contrast
                               categories relatively
                                higher, and rising

                                 (Not controlled for
                             different corpus sizes in
                                   these graphs)



                                                          19
XIP Dashboard
Comparing corpora filtered by concept




                                        20
XIP Dashboard
All papers by year and concept, with
colour = concept density (v2 mockup)




                                       21
XIP Dashboard
Rhetorical function of the sentences
behind each bubble




                                       22
XIP Dashboard
Heatmap of all concepts by
rhetorical classification (v2 mockup)




                                        23
XIP Dashboard User scenarios…
Student / Educator / Researcher
 Familiarization with the
 background material in
      a literature…


                         Comparing different
                       writing patterns between
                           communities, or
                              students…

                                             Focusing on specific
                                            concepts of interest in
                                              combination with
                                               rhetorical context

                                                                      24
XIP Dashboard User Evaluations

Signal-noise
   ratio?


                   Deeper or
               shallower reading?



                               New insights, or just
                                 faster insights?



                                               Better writing, or just
                                               gaming the system?
                                                                         25
Summary
Early phases of work: a promising language technology
now has visual analytics we can deploy with stakeholders


                  Beyond number / size / frequency




                                                           http://www.glennsasscer.com/wordpress/wp-content/uploads/2011/10/iceberg.jpg
                         of posts; ‘hottest thread’
                            An important feature of
                            educated writing is knowing
                            how to signal substantive
                            rhetorical moves. NLP can
                            detect this, and we can now
                            generate rudimentary visual
                            analytics.

                                       To be continued…

XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scientific Metadiscourse

  • 1.
    1st International Workshopon Discourse-Centric Learning Analytics April 8, 2013, LAK13 Conference, Leuven, Belgium XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scientific Metadiscourse Duygu Simsek, Simon Buckingham Shum, Anna De Liddo, Rebecca Ferguson — The Open University, UK Ágnes Sándor — Xerox Research Centre Europe, FR
  • 2.
    Metadiscourse Xerox Incremental Parser Visual analytics v0.1: XIP Dashboard User Scenarios & Evaluation 2
  • 3.
    Metadiscourse signals importantmoves in educated/scholarly narrative (When scholarly culture works well) this is what gets your papers accepted by reviewers, and quoted by others Clear statements regarding the problem, the claim, the argument, the evidence, the implications… This is what we teach students from school upwards 3
  • 4.
    Rhetorical functions ofmetadiscourse identified by the Xerox Incremental Parser (XIP) BACKGROUND KNOWLEDGE: NOVELTY: OPEN QUESTION: Recent studies indicate … ... new insights provide direct evidence ... … little is known … … the previously proposed … ... we suggest a new ... approach ... … role … has been elusive Current data is insufficient … … is universally accepted ... ... results define a novel role ... SUMMARIZING: SIGNIFICANCE: CONTRASTING IDEAS: The goal of this study ... studies ... have provided important … unorthodox view resolves … advances paradoxes … Here, we show ... Knowledge ... is crucial for ... In contrast with previous Altogether, our results ... indicate understanding hypotheses ... valuable information ... from studies ... inconsistent with past findings ... GENERALIZING: SURPRISE: ... emerging as a promising approach We have recently observed ... surprisingly Our understanding ... has grown exponentially ... We have identified ... unusual ... growing recognition of the The recent discovery ... suggests intriguing roles importance ...
  • 5.
    Xerox Incremental Parser(XIP) Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search. 5 Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
  • 6.
    Xerox Incremental Parser(XIP) Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search. 6 Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
  • 7.
    Xerox Incremental Parser(XIP) Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search. 7 Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
  • 8.
    Xerox Incremental Parser(XIP) Sándor, Á. and Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search. 8 Workshop on Natural Language Processing Tools Applied to Discourse Analysis in Psychology, Buenos Aires, Argentina, May 10-14. 2010.
  • 9.
    Initial evaluation ofXIP is promising, but methodologically complex A striking example – but not all were like this (De Liddo et al, 2012) Human analyst XIP Extract from annotation comparison: Document 1 19 sentences annotated 22 sentences annotated 11 sentences same as human annotation Document 2 71 sentences annotated 59 sentences annotated 42 sentences same as human annotation
  • 10.
    Xerox Incremental Parser(XIP) XIP’s raw output is fine for NLP machines/researchers, but not learner/educator friendly
  • 11.
    Xerox Incremental Parser(XIP) XIP’s raw output is fine for NLP machines/researchers, but not learner/educator friendly
  • 12.
    Xerox Incremental Parser(XIP) 5000 (or even 30) plain text files… we need overviews of XIP analyses from a corpus
  • 13.
    Making XIP analyticsvisible: 1. annotations on the full text using the OU’s Cohere social sensemaking app (Firefox add-on)
  • 14.
    Making XIP analyticsvisible: 2. XIP annotations visualized in Cohere as a network around the document
  • 15.
    Making XIP analyticsvisible (2) 2nd phase analysis of document-concept clouds… Connecting? ? Merging? Re-tagging? ? Summarising? ? ?
  • 16.
    XIP Dashboard: towardsan earlier phase dashboard for navigating XIP output Draw attention to patterns of potential significance to students, educators and experienced researchers alike: §  the occurrence of domain concepts in different metadiscourse contexts – e.g. effective tutoring dialogue in sentences classified contrast §  trends of the above over time, e.g. to show the development of an idea §  trends within and differences between research communities as reflected in their publications §  eventually, the above for one’s own writing 16
  • 17.
    Paper prototype toelicit initial reactions 17
  • 18.
    Paper prototype toelicit initial reactions ‘Intro movie’ from researcher Participants point + click with finger Basic navigation seems fine Enthusiasm for a tool that could help with literature analysis Also for a tool to improve one’s own writing by showing trends, or inconsistencies 18
  • 19.
    XIP Dashboard Temporal trendsper corpus Similar patterns for LAK & EDM literatures Summary & Contrast categories relatively higher, and rising (Not controlled for different corpus sizes in these graphs) 19
  • 20.
    XIP Dashboard Comparing corporafiltered by concept 20
  • 21.
    XIP Dashboard All papersby year and concept, with colour = concept density (v2 mockup) 21
  • 22.
    XIP Dashboard Rhetorical functionof the sentences behind each bubble 22
  • 23.
    XIP Dashboard Heatmap ofall concepts by rhetorical classification (v2 mockup) 23
  • 24.
    XIP Dashboard Userscenarios… Student / Educator / Researcher Familiarization with the background material in a literature… Comparing different writing patterns between communities, or students… Focusing on specific concepts of interest in combination with rhetorical context 24
  • 25.
    XIP Dashboard UserEvaluations Signal-noise ratio? Deeper or shallower reading? New insights, or just faster insights? Better writing, or just gaming the system? 25
  • 26.
    Summary Early phases ofwork: a promising language technology now has visual analytics we can deploy with stakeholders Beyond number / size / frequency http://www.glennsasscer.com/wordpress/wp-content/uploads/2011/10/iceberg.jpg of posts; ‘hottest thread’ An important feature of educated writing is knowing how to signal substantive rhetorical moves. NLP can detect this, and we can now generate rudimentary visual analytics. To be continued…