Your SlideShare is downloading. ×
0
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities Traian Rebedea 1 , Stefan ...
Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul>...
Context <ul><li>Computer-assisted learning </li></ul><ul><ul><li>Developing tools to support the learning process </li></u...
Objectives <ul><li>Automatic extraction of useful social and semantic information from conversations </li></ul><ul><ul><li...
Experiments <ul><li>Languages: English (advantages: existing NLP tools) and Romanian </li></ul><ul><li>Computer Science – ...
Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul>...
Socio-cultural Paradigm <ul><li>the role of socially established artefacts in communication and learning (Vygotsky) </li><...
Bakhtin’s Dialogism <ul><li>Bakhtin’s ideas </li></ul><ul><ul><li>Dialogism </li></ul></ul><ul><ul><li>Polyphony </li></ul...
Voices in Chats <ul><li>Utterances should be the units of analysis  </li></ul><ul><li>An utterance contains at least one v...
Discussion Threads
Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul>...
Foreword <ul><li>Transcript chats are read from HTML or XML files </li></ul><ul><li>ConcertChat environment ( F raunhofer ...
Techniques <ul><li>Tokenization </li></ul><ul><li>Stop-words, emoticons and usual abbreviations ( :) , :D , brb, thx, …) a...
Detecting the Topics <ul><li>Each word in the chat becomes a candidate concept </li></ul><ul><ul><li>Synset list </li></ul...
Detecting the Topics (2)
<ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detec...
Assessing the Competencies <ul><li>Graphics – evaluates the competency of each participant starting from the chat topics (...
<ul><li>The value of each utterance is computed by reporting it to an  abstract utterance </li></ul><ul><li>Abstract utter...
Computing the Competencies <ul><li>At the start of the conversation, each participant has a null competency.  </li></ul><u...
Competencies’ Graphics <ul><li>Oy axis – Value of competency </li></ul><ul><li>Ox axis – The number of the utterance </li>...
Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul>...
Discovering Implicit Voices <ul><li>We have explicit references </li></ul><ul><li>We want to discover more references </li...
Discovering Implicit Voices (2) <ul><li>There are a number of empirical methods </li></ul><ul><li>Examples </li></ul><ul><...
Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul>...
<ul><li>Conversation is a graph </li></ul><ul><ul><li>Vertices = utterances </li></ul></ul><ul><ul><li>Edges = references ...
Utterances’ Strength <ul><li>The importance of an utterance in a conversation can be computed using: </li></ul><ul><ul><li...
Visual Representation
Conclusions <ul><li>Social-semantic data extracted from conversations:  </li></ul><ul><ul><li>Discovery and visualisation ...
Conclusions (2) <ul><li>Machine learning for the automatic discovery of the rules that define implicit references </li></u...
Thank You!
Upcoming SlideShare
Loading in...5
×

Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities

831

Published on

EC-TEL 2008

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
831
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities"

  1. 1. Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities Traian Rebedea 1 , Stefan Trausan-Matu 1,2 , Costin Chiru 1 1 “Politehnica” University of Bucharest, Department of Computer Science and Engineering 2 Research Institute for Artificial Intelligence of the Romanian Academy {traian.rebedea, stefan.trausan, costin.chiru} @ cs.pub.ro
  2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detecting conversation’s topics </li></ul></ul><ul><ul><li>Assessing learners’ competencies </li></ul></ul><ul><ul><li>Discovering implicit voices </li></ul></ul><ul><ul><li>Conversation graph </li></ul></ul><ul><li>Conclusions </li></ul>
  3. 3. Context <ul><li>Computer-assisted learning </li></ul><ul><ul><li>Developing tools to support the learning process </li></ul></ul><ul><ul><li>Evaluation of these tools (and of the learning process) </li></ul></ul><ul><ul><li>Determining the learners’ performances </li></ul></ul><ul><li>Computer Supported Collaborative Learning – CSCL </li></ul><ul><li>Main idea: “rather than speaking about ‘acquisition of knowledge,’ many people prefer to view learning as becoming a participant in a certain discourse” (Sfard, 2000) </li></ul><ul><li>Focus on studying interactions between the participants in chat conversations in small groups </li></ul>
  4. 4. Objectives <ul><li>Automatic extraction of useful social and semantic information from conversations </li></ul><ul><ul><li>Determining relationships between utterances </li></ul></ul><ul><ul><li>Utterances that have influenced the further development of the conversation </li></ul></ul><ul><ul><li>The performance / competency of each participant </li></ul></ul><ul><li>Designing an interface for the visualisation of a conversation </li></ul><ul><li>Applied both to chats, discussion forums, etc and face-2-face discussions </li></ul>
  5. 5. Experiments <ul><li>Languages: English (advantages: existing NLP tools) and Romanian </li></ul><ul><li>Computer Science – HCI, NLP and Algorithm Design courses in “Politehnica” University of Bucharest </li></ul><ul><li>Small groups of 4-5 students – all of the students must be graded (over 100 students / course) </li></ul><ul><li>The conversations have well-determined subjects </li></ul><ul><ul><li>Collaborative, team work </li></ul></ul><ul><ul><li>Competitive </li></ul></ul><ul><li>Also used chat transcripts from Virtual Math Teams, Drexel University, Philadelphia, US </li></ul>
  6. 6. Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detecting conversation’s topics </li></ul></ul><ul><ul><li>Assessing learners’ competencies </li></ul></ul><ul><ul><li>Discovering implicit voices </li></ul></ul><ul><ul><li>Conversation graph </li></ul></ul><ul><li>Conclusions </li></ul>
  7. 7. Socio-cultural Paradigm <ul><li>the role of socially established artefacts in communication and learning (Vygotsky) </li></ul><ul><li>Bakhtin focuses on the role of language and discourse, and especially of speech and dialog: “… Any true understanding is dialogic in nature.” </li></ul><ul><li>Lotman considers text as a „thinking device” </li></ul>
  8. 8. Bakhtin’s Dialogism <ul><li>Bakhtin’s ideas </li></ul><ul><ul><li>Dialogism </li></ul></ul><ul><ul><li>Polyphony </li></ul></ul><ul><ul><li>Inter-animation of voices </li></ul></ul><ul><li>Bakhtin: “ The specific totality of ideas, thoughts and words is everywhere passed through several unmerged voices, taking on a different sound in each ” – referring to Dostoevsky’s novels </li></ul><ul><li>Dual nature of voices: community and individuality </li></ul>
  9. 9. Voices in Chats <ul><li>Utterances should be the units of analysis </li></ul><ul><li>An utterance contains at least one voice – the one of the participant that issued it </li></ul><ul><li>Most of the utterances contain multiple voices </li></ul><ul><li>The inter-animation of the voices – discussion threads of the conversation </li></ul>
  10. 10. Discussion Threads
  11. 11. Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detecting conversation’s topics </li></ul></ul><ul><ul><li>Assessing learners’ competencies </li></ul></ul><ul><ul><li>Discovering implicit voices </li></ul></ul><ul><ul><li>Conversation graph </li></ul></ul><ul><li>Conclusions </li></ul>
  12. 12. Foreword <ul><li>Transcript chats are read from HTML or XML files </li></ul><ul><li>ConcertChat environment ( F raunhofer ) </li></ul><ul><ul><li>Advantages for collaborative work </li></ul></ul><ul><ul><li>Enables the use of explicit references to previous utterances or a whiteboard </li></ul></ul><ul><li>Implementation in C#.NET </li></ul>
  13. 13. Techniques <ul><li>Tokenization </li></ul><ul><li>Stop-words, emoticons and usual abbreviations ( :) , :D , brb, thx, …) are eliminated </li></ul><ul><li>WordNet for identifying synonyms </li></ul><ul><li>Misspells are searched using the Google API </li></ul><ul><li>The ontology can be with words discovered in the chat, specific to the conversation’s domain </li></ul><ul><li>Pattern analysis </li></ul>
  14. 14. Detecting the Topics <ul><li>Each word in the chat becomes a candidate concept </li></ul><ul><ul><li>Synset list </li></ul></ul><ul><ul><li>Frequency </li></ul></ul><ul><li>Clustering algorithm for the concepts’ unification </li></ul><ul><li>If the synsets of two concepts have a common word </li></ul><ul><ul><li>The two synset lists are merged </li></ul></ul><ul><ul><li>The frequency of the resulting concept = sum of the frequencies of the unified concepts </li></ul></ul><ul><li>The resulting concepts – the main topics of the conversation </li></ul>
  15. 15. Detecting the Topics (2)
  16. 16. <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detecting conversation’s topics </li></ul></ul><ul><ul><li>Assessing the learners’ competencies </li></ul></ul><ul><ul><li>Discovering implicit voices </li></ul></ul><ul><ul><li>Conversation graph </li></ul></ul><ul><li>Conclusions </li></ul>Overview
  17. 17. Assessing the Competencies <ul><li>Graphics – evaluates the competency of each participant starting from the chat topics (concepts represented as synsets) </li></ul><ul><li>Uses other criteria like the nature of the utterances: questions, agreements, references, etc. are treated different </li></ul><ul><li>Parameters: </li></ul><ul><ul><li>Factors for references </li></ul></ul><ul><ul><li>Bonuses for agreements, penalties for disagreements </li></ul></ul><ul><ul><li>O minimum value that is awarded to any line in the chat </li></ul></ul><ul><ul><li>Penalties for (dis-)agreement, as they present less originality </li></ul></ul>
  18. 18. <ul><li>The value of each utterance is computed by reporting it to an abstract utterance </li></ul><ul><li>Abstract utterance – built from the most important concepts identified in the chat; we only consider the concepts that have a frequency greater than a given threshold </li></ul><ul><li>Every utterance in the chat is scaled in the interval 0 – 100, by comparison to the abstract utterance </li></ul><ul><li>Synsets are used for every word </li></ul><ul><li>An utterance with 0 score does not contain any concept from the abstract one, and an utterance with an 100 score contains all the concepts from the abstract one </li></ul>Value of an Utterance
  19. 19. Computing the Competencies <ul><li>At the start of the conversation, each participant has a null competency. </li></ul><ul><li>For each utterance in the chat, the value of the competencies are modified accordingly: </li></ul><ul><ul><li>The participant that issued the current utterance receives the its score, eventually downgraded, if it is an (dis-)agreement; </li></ul></ul><ul><ul><li>All the participants that are literally present in the current utterance are rewarded with a percentage of its value; </li></ul></ul><ul><ul><li>The participant that issued the utterance referred by the current one is rewarded for an agreement and penalized for a disagreement, with a constant value; </li></ul></ul><ul><ul><li>The participant that issued the utterance referred by the current one and is not a (dis-)agreement is rewarded with a fraction of the value of this utterance; </li></ul></ul><ul><ul><li>if the current utterance has a score of 0, the issuer will receive a minimum score (for participation). </li></ul></ul>
  20. 20. Competencies’ Graphics <ul><li>Oy axis – Value of competency </li></ul><ul><li>Ox axis – The number of the utterance </li></ul>
  21. 21. Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detecting conversation’s topics </li></ul></ul><ul><ul><li>Assessing the learners’ competencies </li></ul></ul><ul><ul><li>Discovering implicit voices </li></ul></ul><ul><ul><li>Conversation graph </li></ul></ul><ul><li>Conclusions </li></ul>
  22. 22. Discovering Implicit Voices <ul><li>We have explicit references </li></ul><ul><li>We want to discover more references </li></ul><ul><li>Why ? Haste and lack of attention </li></ul><ul><li>The method </li></ul><ul><ul><li>List of patterns that consist of a set of words (expressions) and a local subject called the referred word </li></ul></ul><ul><ul><li>If an utterance matches one of the patterns, we determine what word in the utterance is the referred word (e.g. “I don’t agree with your assessment ”) </li></ul></ul><ul><ul><li>we search for this word in a predetermined number of the most recent previous utterances </li></ul></ul><ul><ul><li>If we can find this word in one of these utterances, then we have discovered an implicit relationship between the two utterances, the current one referring to the identified one </li></ul></ul><ul><li>During the identification process, the synsets of the words are used </li></ul>
  23. 23. Discovering Implicit Voices (2) <ul><li>There are a number of empirical methods </li></ul><ul><li>Examples </li></ul><ul><ul><li>Short agreement / disagreement, then B refers A </li></ul></ul><ul><ul><li>A – I think wikis are the best </li></ul></ul><ul><ul><li>B – I disagree </li></ul></ul><ul><ul><li>REF A, REF B – explicit and B – short (dis)agreement, then C implicitly refers A ( transitivity ) </li></ul></ul><ul><ul><li>A – I think wikis are the best </li></ul></ul><ul><ul><li>(…) </li></ul></ul><ul><ul><li>B – I disagree REF A </li></ul></ul><ul><ul><li>(…) </li></ul></ul><ul><ul><li>C – Maybe we should talk about them anyway REF B </li></ul></ul>
  24. 24. Overview <ul><li>Introduction </li></ul><ul><li>Theoretical background </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Detecting conversation’s topics </li></ul></ul><ul><ul><li>Assessing the learners’ competencies </li></ul></ul><ul><ul><li>Discovering implicit voices </li></ul></ul><ul><ul><li>Conversation graph </li></ul></ul><ul><li>Conclusions </li></ul>
  25. 25. <ul><li>Conversation is a graph </li></ul><ul><ul><li>Vertices = utterances </li></ul></ul><ul><ul><li>Edges = references between utterances </li></ul></ul><ul><li>The graph is directed and acyclic – can be topologically sorted </li></ul><ul><li>Using the graph: </li></ul><ul><ul><li>Segmentation of the chat in discussion threads </li></ul></ul><ul><ul><li>Determining the strength of an utterance </li></ul></ul><ul><ul><li>Graphical representation of the conversation </li></ul></ul>Conversation Graph
  26. 26. Utterances’ Strength <ul><li>The importance of an utterance in a conversation can be computed using: </li></ul><ul><ul><li>Length </li></ul></ul><ul><ul><li>The importance of the words </li></ul></ul><ul><li>Another approach : an utterance is important if it influences the further evolution of the conversation </li></ul><ul><li>An important utterance – referenced by many further utterances </li></ul><ul><li>Thus, the importance can be considered as a measure of the strength of the utterance </li></ul><ul><li>The utterance is strong if it influences the rest of the conversation (like a breaking news at TV) </li></ul><ul><li>Computed recurrently: </li></ul><ul><li>Utterance strength = 1 + param1 * number references + param2 * sum of the references’ strength </li></ul>
  27. 27. Visual Representation
  28. 28. Conclusions <ul><li>Social-semantic data extracted from conversations: </li></ul><ul><ul><li>Discovery and visualisation of the discourse </li></ul></ul><ul><ul><li>Determining important utterances </li></ul></ul><ul><ul><li>Assessing the competencies </li></ul></ul><ul><ul><li>Searching for references between utterances </li></ul></ul><ul><li>Successfully integrated ideas and techniques from: </li></ul><ul><ul><li>Socio-cultural and dialogic paradigm </li></ul></ul><ul><ul><li>Classical cognitive paradigm – ontologies and knowledge-based processing </li></ul></ul><ul><ul><li>Natural language processing </li></ul></ul>
  29. 29. Conclusions (2) <ul><li>Machine learning for the automatic discovery of the rules that define implicit references </li></ul><ul><ul><li>A chat annotation tool has been built </li></ul></ul><ul><ul><li>Started creating a annotated chat corpus to be used as a golden standard </li></ul></ul><ul><li>Improving the method used to compute the competences – integrating SNA techniques </li></ul><ul><li>Use domain ontologies and/or pLSA </li></ul><ul><li>Current and further work is part of LTfLL FP7 project </li></ul>
  30. 30. Thank You!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×