Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities

  • 770 views
Uploaded on

EC-TEL 2008

EC-TEL 2008

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
770
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities Traian Rebedea 1 , Stefan Trausan-Matu 1,2 , Costin Chiru 1 1 “Politehnica” University of Bucharest, Department of Computer Science and Engineering 2 Research Institute for Artificial Intelligence of the Romanian Academy {traian.rebedea, stefan.trausan, costin.chiru} @ cs.pub.ro
  • 2. Overview
    • Introduction
    • Theoretical background
    • Implementation
      • Detecting conversation’s topics
      • Assessing learners’ competencies
      • Discovering implicit voices
      • Conversation graph
    • Conclusions
  • 3. Context
    • Computer-assisted learning
      • Developing tools to support the learning process
      • Evaluation of these tools (and of the learning process)
      • Determining the learners’ performances
    • Computer Supported Collaborative Learning – CSCL
    • Main idea: “rather than speaking about ‘acquisition of knowledge,’ many people prefer to view learning as becoming a participant in a certain discourse” (Sfard, 2000)
    • Focus on studying interactions between the participants in chat conversations in small groups
  • 4. Objectives
    • Automatic extraction of useful social and semantic information from conversations
      • Determining relationships between utterances
      • Utterances that have influenced the further development of the conversation
      • The performance / competency of each participant
    • Designing an interface for the visualisation of a conversation
    • Applied both to chats, discussion forums, etc and face-2-face discussions
  • 5. Experiments
    • Languages: English (advantages: existing NLP tools) and Romanian
    • Computer Science – HCI, NLP and Algorithm Design courses in “Politehnica” University of Bucharest
    • Small groups of 4-5 students – all of the students must be graded (over 100 students / course)
    • The conversations have well-determined subjects
      • Collaborative, team work
      • Competitive
    • Also used chat transcripts from Virtual Math Teams, Drexel University, Philadelphia, US
  • 6. Overview
    • Introduction
    • Theoretical background
    • Implementation
      • Detecting conversation’s topics
      • Assessing learners’ competencies
      • Discovering implicit voices
      • Conversation graph
    • Conclusions
  • 7. Socio-cultural Paradigm
    • the role of socially established artefacts in communication and learning (Vygotsky)
    • Bakhtin focuses on the role of language and discourse, and especially of speech and dialog: “… Any true understanding is dialogic in nature.”
    • Lotman considers text as a „thinking device”
  • 8. Bakhtin’s Dialogism
    • Bakhtin’s ideas
      • Dialogism
      • Polyphony
      • Inter-animation of voices
    • Bakhtin: “ The specific totality of ideas, thoughts and words is everywhere passed through several unmerged voices, taking on a different sound in each ” – referring to Dostoevsky’s novels
    • Dual nature of voices: community and individuality
  • 9. Voices in Chats
    • Utterances should be the units of analysis
    • An utterance contains at least one voice – the one of the participant that issued it
    • Most of the utterances contain multiple voices
    • The inter-animation of the voices – discussion threads of the conversation
  • 10. Discussion Threads
  • 11. Overview
    • Introduction
    • Theoretical background
    • Implementation
      • Detecting conversation’s topics
      • Assessing learners’ competencies
      • Discovering implicit voices
      • Conversation graph
    • Conclusions
  • 12. Foreword
    • Transcript chats are read from HTML or XML files
    • ConcertChat environment ( F raunhofer )
      • Advantages for collaborative work
      • Enables the use of explicit references to previous utterances or a whiteboard
    • Implementation in C#.NET
  • 13. Techniques
    • Tokenization
    • Stop-words, emoticons and usual abbreviations ( :) , :D , brb, thx, …) are eliminated
    • WordNet for identifying synonyms
    • Misspells are searched using the Google API
    • The ontology can be with words discovered in the chat, specific to the conversation’s domain
    • Pattern analysis
  • 14. Detecting the Topics
    • Each word in the chat becomes a candidate concept
      • Synset list
      • Frequency
    • Clustering algorithm for the concepts’ unification
    • If the synsets of two concepts have a common word
      • The two synset lists are merged
      • The frequency of the resulting concept = sum of the frequencies of the unified concepts
    • The resulting concepts – the main topics of the conversation
  • 15. Detecting the Topics (2)
  • 16.
    • Introduction
    • Theoretical background
    • Implementation
      • Detecting conversation’s topics
      • Assessing the learners’ competencies
      • Discovering implicit voices
      • Conversation graph
    • Conclusions
    Overview
  • 17. Assessing the Competencies
    • Graphics – evaluates the competency of each participant starting from the chat topics (concepts represented as synsets)
    • Uses other criteria like the nature of the utterances: questions, agreements, references, etc. are treated different
    • Parameters:
      • Factors for references
      • Bonuses for agreements, penalties for disagreements
      • O minimum value that is awarded to any line in the chat
      • Penalties for (dis-)agreement, as they present less originality
  • 18.
    • The value of each utterance is computed by reporting it to an abstract utterance
    • Abstract utterance – built from the most important concepts identified in the chat; we only consider the concepts that have a frequency greater than a given threshold
    • Every utterance in the chat is scaled in the interval 0 – 100, by comparison to the abstract utterance
    • Synsets are used for every word
    • An utterance with 0 score does not contain any concept from the abstract one, and an utterance with an 100 score contains all the concepts from the abstract one
    Value of an Utterance
  • 19. Computing the Competencies
    • At the start of the conversation, each participant has a null competency.
    • For each utterance in the chat, the value of the competencies are modified accordingly:
      • The participant that issued the current utterance receives the its score, eventually downgraded, if it is an (dis-)agreement;
      • All the participants that are literally present in the current utterance are rewarded with a percentage of its value;
      • The participant that issued the utterance referred by the current one is rewarded for an agreement and penalized for a disagreement, with a constant value;
      • The participant that issued the utterance referred by the current one and is not a (dis-)agreement is rewarded with a fraction of the value of this utterance;
      • if the current utterance has a score of 0, the issuer will receive a minimum score (for participation).
  • 20. Competencies’ Graphics
    • Oy axis – Value of competency
    • Ox axis – The number of the utterance
  • 21. Overview
    • Introduction
    • Theoretical background
    • Implementation
      • Detecting conversation’s topics
      • Assessing the learners’ competencies
      • Discovering implicit voices
      • Conversation graph
    • Conclusions
  • 22. Discovering Implicit Voices
    • We have explicit references
    • We want to discover more references
    • Why ? Haste and lack of attention
    • The method
      • List of patterns that consist of a set of words (expressions) and a local subject called the referred word
      • If an utterance matches one of the patterns, we determine what word in the utterance is the referred word (e.g. “I don’t agree with your assessment ”)
      • we search for this word in a predetermined number of the most recent previous utterances
      • If we can find this word in one of these utterances, then we have discovered an implicit relationship between the two utterances, the current one referring to the identified one
    • During the identification process, the synsets of the words are used
  • 23. Discovering Implicit Voices (2)
    • There are a number of empirical methods
    • Examples
      • Short agreement / disagreement, then B refers A
      • A – I think wikis are the best
      • B – I disagree
      • REF A, REF B – explicit and B – short (dis)agreement, then C implicitly refers A ( transitivity )
      • A – I think wikis are the best
      • (…)
      • B – I disagree REF A
      • (…)
      • C – Maybe we should talk about them anyway REF B
  • 24. Overview
    • Introduction
    • Theoretical background
    • Implementation
      • Detecting conversation’s topics
      • Assessing the learners’ competencies
      • Discovering implicit voices
      • Conversation graph
    • Conclusions
  • 25.
    • Conversation is a graph
      • Vertices = utterances
      • Edges = references between utterances
    • The graph is directed and acyclic – can be topologically sorted
    • Using the graph:
      • Segmentation of the chat in discussion threads
      • Determining the strength of an utterance
      • Graphical representation of the conversation
    Conversation Graph
  • 26. Utterances’ Strength
    • The importance of an utterance in a conversation can be computed using:
      • Length
      • The importance of the words
    • Another approach : an utterance is important if it influences the further evolution of the conversation
    • An important utterance – referenced by many further utterances
    • Thus, the importance can be considered as a measure of the strength of the utterance
    • The utterance is strong if it influences the rest of the conversation (like a breaking news at TV)
    • Computed recurrently:
    • Utterance strength = 1 + param1 * number references + param2 * sum of the references’ strength
  • 27. Visual Representation
  • 28. Conclusions
    • Social-semantic data extracted from conversations:
      • Discovery and visualisation of the discourse
      • Determining important utterances
      • Assessing the competencies
      • Searching for references between utterances
    • Successfully integrated ideas and techniques from:
      • Socio-cultural and dialogic paradigm
      • Classical cognitive paradigm – ontologies and knowledge-based processing
      • Natural language processing
  • 29. Conclusions (2)
    • Machine learning for the automatic discovery of the rules that define implicit references
      • A chat annotation tool has been built
      • Started creating a annotated chat corpus to be used as a golden standard
    • Improving the method used to compute the competences – integrating SNA techniques
    • Use domain ontologies and/or pLSA
    • Current and further work is part of LTfLL FP7 project
  • 30. Thank You!