The document describes a tool for discourse analysis and visualization that was developed to analyze different types of discourses. The tool combines cognitive and socio-cultural paradigms using the concept of voice from polyphony theory. It identifies important voices in a text through lexical chains and displays the discourse through different views including word-level representations that show voice frequency and distribution, sentence-level representations that show voice distribution across sentences, and identification of pivotal moments where voices intersect. The tool was evaluated on collaborative learning chats and showed potential to accurately assess discussion quality and compare different discourses.
2. Content
• Introduction
• Theoretical Ideas
• The Application with the different views
• Conclusions
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
3. Introduction
• Purpose? Develop a method and a tool for
analyzing and visualizing different type of
discourses.
– Now the analysis methods are biased towards one of
the two types of texts: narrations or conversations.
• How? Combining the cognitive and socio-cultural
paradigms using the concept of voice from the
Polyphonic Theory and the ideas related to
identifying polyphonic threads.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
4. Theoretical Ideas
• Existing theories from discourse analysis are either:
– Cognitive paradigm (NLP) – knowledge is situated in
individuals’ minds (Hobbs, Grosz) focused on how
utterances build up in order to create a hierarchically
organized discourse. Problem: lacking to capture the
complex interactions between these utterances.
– Socio-cultural paradigm – knowledge is socially
constructed (Bakhtin, Vygotsky, Trausan-Matu) focused
on collaboration aspects – “rather than speaking about
‘acquisition of knowledge,’ many people prefer to view
learning as becoming a participant in a certain discourse”
[Sfard, 2000] – better suited today in the context of Web
2.0 and of the wide use of chats, forums, blogs and wikis.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
5. Polyphony & Inter-animation
• Bakhtin (1973) introduced the Polyphony Theory stating
that in any text there are multiple voices that influence
each other inter-animation of the ideas presented by
them.
• Voice = position taken by one or more of the participants
– Current implementations: a participant or an utterance.
– We considered that a voice = an idea that is rhythmically
repeated.
• Voices identification – based on Tannen’s Theory:
“Repetition is a resource by which conversationalists
together create a discourse, a relationship, a world.” –
identification of the words that can be used to express the
same idea (through lexical chains) – works for both
conversations and narrations.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
6. The System
• Extract the voices from a discourse (lexical chains) – their importance
determines which of them will become voices, but the evaluation is
left to the user.
• Extract the paronyms of each concept (different type of voice) - also
needed as a way to counter-balance the spelling errors and to
identify rhetoric constructions (solution-resolution, presentation-
representation, log-blog, etc.).
• Provide the user the possibility to decide what needs to be
investigated: exact repetitions chains, chains of conceptually-related
words, paronyms chains, or any combination of them.
• Is able to analyze both types of text, and for visualization we offer
four different views of the discourse: view file (the implicit
visualization method), a word-level representation of the voices, a
sentence-level representation and the visualization of the most
important moments of the discourse.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
7. View File Visualization
21.09.2012 A Tool for Discourse Analysis and Visualization ITS 2012
semantically
related words
repetitions paronyms
8. Word-Level Representation (1)
21.09.2012 A Tool for Discourse Analysis and Visualization 1
Allows the
user to
choose
one/more
voices and
to visualize
their flow
in the text
Concepts
and their
frequency
Considered voices –
each voice has a
different color
9. Word-Level Representation (2)
• Allows identifying:
– Which voice is stronger and in what areas (“cell” is
stronger than all the others voices);
– if a voice is present in the whole text (e.g. “cell”), or is a
local artifact (e.g. “genome”);
– if a voice is more or less focused: for example “rna” and
“cytoplasm”, each having exactly 7 occurrences, can be
differentiated - a higher density (rhythmicity) for “rna”
than for “cytoplasm”;
– the voices that “work” together – that can be found in the
same area of text (e.g. “cell” and “cytoplasm”) and the
ones that exclude each other – that are not found in the
same areas of discourse (e.g. “rna” and “genome”);
– Collocations – concepts that work together but cannot be
related to each other (e.g. “world war”, “cold war”).
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
10. Word-Level Representation (3)
21.09.2012 A Tool for Discourse Analysis and Visualization
Cold war is a
collocation and
from these two
words, “war”
fits in the
context (“cold”
is only a
modifier) since
it appears in
areas where
“cold” is not
present, unlike
the opposite
situation.
EIDWT 2012
11. Sentence-Level Representation (1)
• Presents the distribution of the sentences that contain the
voices considered important by the user.
• Influenced by the view offered by the “Google books”, but
enhanced with the option to:
– Make multiple searches (multiple voices) in order to be able to
compare them;
– Also allowing to observe the semantically related terms or
paronyms.
• Utility: detection of the rhythmicity of a given voice;
identify the voices that are stronger or more focused than
others and their types (global or local), identify the voices
that work together and the ones that exclude each other,
identify the moments of shifting from a topic to another,
identify the topic drifts (off-topic areas).
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
12. Sentence-Level Representation (2)
21.09.2012 A Tool for Discourse Analysis and Visualization 1
1. Stronger
(“Christian”
> “biblical”)
2. More
focused
(“biblical” >
“Allah”)
3. Type –
global
(“Christian”,
“humans”,
“creation”,
“life”) or
local
(“biblical”,
“Allah”,
“Muslim”,
“Satanism”)
4. Voices that work
together (“Allah”
and “Muslim”) and
exclude each other
(“Muslim”,
“biblical”,
“Satanism” and
“Buddha”)
5. Moments of
shifting from a
topic to another
(“Satanism”,
“Muslim”, and
“biblical”)
6. Disambiguation
13. Visualization of the Most Important
Moments of the Discourse
• Is a consequence of the interactions that can be observed between
different voices.
• We started from the important voices and investigated the areas
were these voices inter-animate (influence each other or co-
participate to the utterance meaning) the important moments of
the discourse.
• Analyzing the observed types of interactions, we propose a
classification of these moments in 5 different classes:
– Pivotal moments (one voice substitutes the other),
– Moments of convergence (all voices die out),
– Singular moments (all voices but one die out),
– Moments of divergence (multiple voices meet and then are present in
different areas), and
– Meeting points (multiple voices are constantly debating throughout
the discourse).
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
14. System Evaluation
• The results of this system depend very much on the purpose of its use and
on the voices considered important by the user.
• Used to analyze CSCL chats consisting of 4 participants debating about
which is the best tool for collaborative learning (chat, blog, forum, wiki)
(Trausan-Matu & Rebedea, 2010; Chiru et al., 2011).
• The conversations have been assessed by two CSCL experts and after that
they were automatically evaluated with our system, considering the inter-
animation of the voices of chat, blog, forum and wiki.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
Convergence
Moments
Singular
moments
Meeting
Points
Reviewer
1
Reviewer
2
Average
Reviews
Chat 1 0 1 5 7.8 7.7 7.75
Chat 2 1 0 15 10 9.3 9.65
Chat 3 0 0 8 9 8.9 8.95
Chat 4 1 0 7 8.4 8.6 8.5
Chat 5 0 0 11 9 9 9
15. Results Interpretation
21.09.2012 A Tool for Discourse Analysis and Visualization
• The chats that were considered to be good by the experts had more
important moments than the others.
• The number of meeting points seems to be a good estimator of the
conversations quality, this criterion alone being able to rank them in the
same way as the experts did.
• Differences between the grades offered by the experts and the number of
meeting points found in chats signal that this cannot be the only criterion
the other important moments that were identified have their own role
in this evaluation.
• Their importance depends on the task of the analyzed discourse:
– if a solution is needed at the end of a discourse, the convergence moments
should have higher importance;
– if the best solution from multiple options is searched, then the singular
moments should receive special treatment.
EIDWT 2012
16. Conclusions (1)
• We have built an application for discourse
analysis and visualization that is:
– An adaptation of the Polyphony Theory, since is
based on the voice concept;
– Flexible, since the user has the possibility to select
what information to be shown;
– Domain independent (examples from fields such
as history, religion, CSCL, or medicine);
– Language independent, as long as there is a mean
to extract the voices from the discourse.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012
17. Conclusions (2)
• This application could be used both at:
– Inter-document level (for comparing discourses from a
corpora that debate the same topics) and
– Intra-document level for evaluating
• the “strength” of different voices from the discourse;
• how focused these voices are;
• their types: local or global;
• the voices that can/cannot be used in the same areas of discourse;
• the areas where the topic drifts is present;
• for disambiguating polysemous words by considering the context
provided by the voices that are found in its vicinity;
• for identifying the most important moments from a discourse,
which could also give information about the areas where specific
topics are debated, about the collocations, syntagms and idioms
that are encountered in that discourse.
21.09.2012 A Tool for Discourse Analysis and Visualization EIDWT 2012