Automatic Assessment of Collaborative Chat Conversations with PolyCAFe  Traian Rebedea 1 , Mihai Dascalu 1 , Stefan Trausan-Matu 1 ,  Gillian Armitt 2  & Costin Chiru 1 1 - Politehnica University of Bucharest 2 - University of Manchester
Overview Context Polyphonic Framework System Architecture Utterance Assessment Validation Experiment Verification Results Transferability & Conclusions
Context CSCL Online textual interactions using web communication technologies Education through dialogue Focus on chat conversations in small groups There is usually no feedback from tutors (difficult, time-consuming, etc.)
Context – Analysis Tools The need for automatic   Feedback   Assessment There is a wide variety of research for analyzing chat conversations: topic detection and extraction concept formation in a group discussion summarization argumentation and transactivity acts in each posting utterance classification based on concept coverage However, there is no complete analysis tool for online interactions (like chat, forums) of students
Problems P1.  Need to  integrate feedback   on different levels: conversation, utterance, participant S1.  Mixture of techniques: Natural Language Processing Social Network Analysis Information Retrieval P2.  Supplementary, focus on measuring  collaborative discourse   as a key characteristic of successful online textual interaction S2.  New theory of discourse for chat conversations with multiple participants
Polyphonic Framework Classical NLP discourse Monologs Dialogues Coherence is essential in the evolution of discourse (segmentation, topic change, etc.) Dialogues are also based on a two interlocutors model Speech acts or dialogue acts Adjacency pairs Transacts However,  chat conversations with multiple participants are different !
Polyphonic Principles Floor may be shared   by different participants at the same time Allows the evolution of  parallel “discussions” Looking at the “serial” resulting text by concatenating the utterances by post time =>  the discourse may look incoherent   Start from Bakhtin’s dialogic theory for discourse analysis  Discussion threads  <=>  voices  that are more or less powerful than others throughout the conversation Influence between different voices
Words, Voices and Discussion Threads The topic of the discussion defines a list of important concepts (words/stems/…) => initial  voices Other voices appear naturally during the conversation Some may be related to the initial ones Voices can influence each other through  explicit or implicit links   between utterances One or more similar voices can define a discussion thread Discussion threads  = a set of utterances that are linked implicitly or explicitly and correspond to a specific voice or more similar voices
Implicit Links  In order for the analysis to be effective, it is important to discover implicit links between utterances Very difficult task: trade-off precision-recall Repetitions (expanded using ontologies) Semantic similarity (including lexical chains) Adjacency pairs (using speech acts) Cue phrases ( “that’s a good idea”, “as <x> said”, …)
Conversation Graph The conversation can be modeled as a graph: Utterances – nodes Links – edges  Each edge has a  trust factor The conversation graph is fundamental to the analysis: Can be used as a social network Can be used to compute conversation threads, coherence, collaborative discourse, etc.
Proposed Solution - PolyCAFe Chat and Forum Analysis and Feedback System PolyCAFe =  Poly phony-based  C ollaboration  A nalysis and  Fe edback generation Provides feedback to learners, tutors and teachers related to the interaction of students in online discussions Takes into account both: Content of the conversation (related to a domain or topic) Collaboration (related to the conversation, participation, etc.)
Technical architecture
Utterance evaluation
Utterance Evaluation & Threads Utterance Past Future Thread Coherence Future Impact Content In-Degree Out-Degree Social Current Relevance Completeness Centrality
Conversation Visualization
Utterance Feedback
Validation Experiment 35 students part of the HCI course Experimental group: 25 Control group: 10 Divided in teams of 5 Had two distinct assignments that were correlated  6 tutors had to provide manual feedback for the students: using and not using PolyCAFe
Assignment Example A debate about the best collaboration tool for the web: chat, blog, wiki, forums and Google Wave. Each student shall choose one of the 5 tools and shall present its advantages and the disadvantages of the other tools. Thus, you will act as a &quot; sales person &quot; for your tool and try to  convince  the others that you have the best offer. You must also defend your product whenever possible and criticize the other products if needed.
Tutor Efficiency VT1:  Tutors/facilitators  spend less time preparing feedback for learners compared with traditional means Likert questionnaire: everyone agreed that “they  find the information needed to write the feedback for the learners more quickly using PolyCAFe than without it” (m=4.7, sd=0.52, agree=100%) Comparison between average time needed to prepare feedback for a conversation with and without the system: without PolyCAFe: 84 minutes with PolyCAFe: 55 minutes Improvement: 35%
Quality and Consistency of Feedback VT2:  Learners  perceive that the feedback received from the system contributes to informing their study activities  Logging:  285 visits to PolyCAFe and 1447 page-views, that results in more than 40 page-views in average per student.
Quality and Consistency of Feedback Validation statement Mean Standard deviation % Agree / Strongly agree The information the system provides me is accurate enough for helping me perform my learning tasks. 3.7 0.52 60% P olyCAFe's feedback is sufficiently accurate to inform my study activities. 3.8 0.88 64% PolyCAFe provides feedback that is useful to my study activities. 3.8 0.85 72% P olyCAFe provides feedback that is relevant to my study activities. 3.9 0.91 72% I trust PolyCAFe to provide helpful feedback. 4.0 0.87 80%
Quality of Educational Output VT3:  Learner  performance in online discussions is improved in the areas of content coverage and collaboration when using PolyCAFe Measurements computed  for the second chat assignment, by comparing experimental with control groups
Quality of Educational Output Experimental group Control group Improvement over control group Average score for a chat conversation (collaboration + content) 6.80 6.37 6.8% Average importance of the most important 20 concepts 0.194 0.192 1.2% Average number of utterances 351 338 3.8% Average distribution of (implicit and explicit) links between utterances 1.12 0.87 29%
Verification Experiments Utterance scoring Participant ranking Speech acts classification Evaluation of collaboration score & more
Utterance Scoring Chat 1 (331 utterances): Scores: 1 (not important) – 4  (very important) Tutor 1–Tutor 2 (inter-rater) correlation:  61% Tutor 1 – PolyCAFe correlation:   60% Tutor 2 – PolyCAFe correlation:   51% Tutor average – PolyCAFe correlation:   57%
Participant Ranking Rank  Student 1 Student 2 Student 3 Student 4 Student 5 Student 1 - 2 3 1 4 Student 2 2 - 3 1 4 Student 3 2 3 - 1 4 Student 4 1 2 3 - 4 Student 5 1 2 4 3 - Student average 2 3 4 1 5 Tutor 1 4 1 5 2 3 Tutor 2 4 2 5 1 3 Tutor average 4 1-2 5 1-2 3 PolyCAFe 4 2 5 1 3
Participant Ranking Rankings compared  Correlation Precision Average distance Tutors – System  94% 77% 0.23 Students – System 84% 66% 0.43 Tutors – Students 84% 71% 0.40
Transferability Issues Domain The topic of the conversation should be easily solved using discussions, no graphics or formulas Language Need for the components of the NLP pipe Corpus for training the LSA Maybe, a domain ontology Activity Collaborative activity Teams of 4-15 students (in the current design)
Conclusions Learners use web communication technologies Need tools to harvest this data Want to replace tutors? No! Just to provide provisory feedback to learners Support tutors to provide final feedback Enhance the usage of web conversations by the participants The feedback that is provided still needs to be improved Corpora with good and bad conversations manually annotated by tutors
Test it! http://ltfll-lin.code.ro/ltfll/wp5/ Follow link to test PolyCAFe Screencasts for all LTfLL services: http://augur.wu.ac.at/screencasts/v1/
THANK YOU! Questions    & Feedback Special thanks to FP7 REGPOT ERRIC

Automatic assessment of collaborative chat conversations with PolyCAFe - EC-TEL2011

  • 1.
    Automatic Assessment ofCollaborative Chat Conversations with PolyCAFe Traian Rebedea 1 , Mihai Dascalu 1 , Stefan Trausan-Matu 1 , Gillian Armitt 2 & Costin Chiru 1 1 - Politehnica University of Bucharest 2 - University of Manchester
  • 2.
    Overview Context PolyphonicFramework System Architecture Utterance Assessment Validation Experiment Verification Results Transferability & Conclusions
  • 3.
    Context CSCL Onlinetextual interactions using web communication technologies Education through dialogue Focus on chat conversations in small groups There is usually no feedback from tutors (difficult, time-consuming, etc.)
  • 4.
    Context – AnalysisTools The need for automatic Feedback Assessment There is a wide variety of research for analyzing chat conversations: topic detection and extraction concept formation in a group discussion summarization argumentation and transactivity acts in each posting utterance classification based on concept coverage However, there is no complete analysis tool for online interactions (like chat, forums) of students
  • 5.
    Problems P1. Need to integrate feedback on different levels: conversation, utterance, participant S1. Mixture of techniques: Natural Language Processing Social Network Analysis Information Retrieval P2. Supplementary, focus on measuring collaborative discourse as a key characteristic of successful online textual interaction S2. New theory of discourse for chat conversations with multiple participants
  • 6.
    Polyphonic Framework ClassicalNLP discourse Monologs Dialogues Coherence is essential in the evolution of discourse (segmentation, topic change, etc.) Dialogues are also based on a two interlocutors model Speech acts or dialogue acts Adjacency pairs Transacts However, chat conversations with multiple participants are different !
  • 7.
    Polyphonic Principles Floormay be shared by different participants at the same time Allows the evolution of parallel “discussions” Looking at the “serial” resulting text by concatenating the utterances by post time => the discourse may look incoherent Start from Bakhtin’s dialogic theory for discourse analysis Discussion threads <=> voices that are more or less powerful than others throughout the conversation Influence between different voices
  • 8.
    Words, Voices andDiscussion Threads The topic of the discussion defines a list of important concepts (words/stems/…) => initial voices Other voices appear naturally during the conversation Some may be related to the initial ones Voices can influence each other through explicit or implicit links between utterances One or more similar voices can define a discussion thread Discussion threads = a set of utterances that are linked implicitly or explicitly and correspond to a specific voice or more similar voices
  • 9.
    Implicit Links In order for the analysis to be effective, it is important to discover implicit links between utterances Very difficult task: trade-off precision-recall Repetitions (expanded using ontologies) Semantic similarity (including lexical chains) Adjacency pairs (using speech acts) Cue phrases ( “that’s a good idea”, “as <x> said”, …)
  • 10.
    Conversation Graph Theconversation can be modeled as a graph: Utterances – nodes Links – edges Each edge has a trust factor The conversation graph is fundamental to the analysis: Can be used as a social network Can be used to compute conversation threads, coherence, collaborative discourse, etc.
  • 11.
    Proposed Solution -PolyCAFe Chat and Forum Analysis and Feedback System PolyCAFe = Poly phony-based C ollaboration A nalysis and Fe edback generation Provides feedback to learners, tutors and teachers related to the interaction of students in online discussions Takes into account both: Content of the conversation (related to a domain or topic) Collaboration (related to the conversation, participation, etc.)
  • 12.
  • 13.
  • 14.
    Utterance Evaluation &Threads Utterance Past Future Thread Coherence Future Impact Content In-Degree Out-Degree Social Current Relevance Completeness Centrality
  • 15.
  • 16.
  • 17.
    Validation Experiment 35students part of the HCI course Experimental group: 25 Control group: 10 Divided in teams of 5 Had two distinct assignments that were correlated 6 tutors had to provide manual feedback for the students: using and not using PolyCAFe
  • 18.
    Assignment Example Adebate about the best collaboration tool for the web: chat, blog, wiki, forums and Google Wave. Each student shall choose one of the 5 tools and shall present its advantages and the disadvantages of the other tools. Thus, you will act as a &quot; sales person &quot; for your tool and try to convince the others that you have the best offer. You must also defend your product whenever possible and criticize the other products if needed.
  • 19.
    Tutor Efficiency VT1: Tutors/facilitators spend less time preparing feedback for learners compared with traditional means Likert questionnaire: everyone agreed that “they find the information needed to write the feedback for the learners more quickly using PolyCAFe than without it” (m=4.7, sd=0.52, agree=100%) Comparison between average time needed to prepare feedback for a conversation with and without the system: without PolyCAFe: 84 minutes with PolyCAFe: 55 minutes Improvement: 35%
  • 20.
    Quality and Consistencyof Feedback VT2: Learners perceive that the feedback received from the system contributes to informing their study activities Logging: 285 visits to PolyCAFe and 1447 page-views, that results in more than 40 page-views in average per student.
  • 21.
    Quality and Consistencyof Feedback Validation statement Mean Standard deviation % Agree / Strongly agree The information the system provides me is accurate enough for helping me perform my learning tasks. 3.7 0.52 60% P olyCAFe's feedback is sufficiently accurate to inform my study activities. 3.8 0.88 64% PolyCAFe provides feedback that is useful to my study activities. 3.8 0.85 72% P olyCAFe provides feedback that is relevant to my study activities. 3.9 0.91 72% I trust PolyCAFe to provide helpful feedback. 4.0 0.87 80%
  • 22.
    Quality of EducationalOutput VT3: Learner performance in online discussions is improved in the areas of content coverage and collaboration when using PolyCAFe Measurements computed for the second chat assignment, by comparing experimental with control groups
  • 23.
    Quality of EducationalOutput Experimental group Control group Improvement over control group Average score for a chat conversation (collaboration + content) 6.80 6.37 6.8% Average importance of the most important 20 concepts 0.194 0.192 1.2% Average number of utterances 351 338 3.8% Average distribution of (implicit and explicit) links between utterances 1.12 0.87 29%
  • 24.
    Verification Experiments Utterancescoring Participant ranking Speech acts classification Evaluation of collaboration score & more
  • 25.
    Utterance Scoring Chat1 (331 utterances): Scores: 1 (not important) – 4 (very important) Tutor 1–Tutor 2 (inter-rater) correlation: 61% Tutor 1 – PolyCAFe correlation: 60% Tutor 2 – PolyCAFe correlation: 51% Tutor average – PolyCAFe correlation: 57%
  • 26.
    Participant Ranking Rank Student 1 Student 2 Student 3 Student 4 Student 5 Student 1 - 2 3 1 4 Student 2 2 - 3 1 4 Student 3 2 3 - 1 4 Student 4 1 2 3 - 4 Student 5 1 2 4 3 - Student average 2 3 4 1 5 Tutor 1 4 1 5 2 3 Tutor 2 4 2 5 1 3 Tutor average 4 1-2 5 1-2 3 PolyCAFe 4 2 5 1 3
  • 27.
    Participant Ranking Rankingscompared Correlation Precision Average distance Tutors – System 94% 77% 0.23 Students – System 84% 66% 0.43 Tutors – Students 84% 71% 0.40
  • 28.
    Transferability Issues DomainThe topic of the conversation should be easily solved using discussions, no graphics or formulas Language Need for the components of the NLP pipe Corpus for training the LSA Maybe, a domain ontology Activity Collaborative activity Teams of 4-15 students (in the current design)
  • 29.
    Conclusions Learners useweb communication technologies Need tools to harvest this data Want to replace tutors? No! Just to provide provisory feedback to learners Support tutors to provide final feedback Enhance the usage of web conversations by the participants The feedback that is provided still needs to be improved Corpora with good and bad conversations manually annotated by tutors
  • 30.
    Test it! http://ltfll-lin.code.ro/ltfll/wp5/Follow link to test PolyCAFe Screencasts for all LTfLL services: http://augur.wu.ac.at/screencasts/v1/
  • 31.
    THANK YOU! Questions & Feedback Special thanks to FP7 REGPOT ERRIC