Learning analytics to identify exploratory dialogue in online discussions


Published on

Social learning analytics are concerned with the process of knowledge construction as learners build knowledge together in their social and cultural environments. One of the most important tools employed during this process is language. In this presentation we take exploratory dialogue, a joint form of co-reasoning, to be an external indicator that learning is taking place. Using techniques developed within the field of computational linguistics, we build on previous work using cue phrases to identify exploratory dialogue within online discussion.

Published in: Education, Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Who we are Where we are from What this presentation is about
  • Why is learning dialogue important? From a sociocultural perspective, language and dialogue are crucial tools in the development of knowledge The work of Neil Mercer and his colleagues has shown that effective dialogue can be taught, and can significantly improve results
  • The three modes of dialogue in learning situations Brief summary of disputational dialogue and cumulative dialogue Exploratory dialogue – and phrases that might signal its presence
  • Example of coding from the pilot study reported at LAK 2011 The analysis pointed to differences between conference participants, and picked out people who seemed likely to be actively engaged in knowledge building. The analysis also discriminated between conference sessions. This work gave us some key phrases, but it required a lot of manual cleansing of data, and use of a Find/Replace program based on cue phrases
  • So we joined up with computationa linguists – our co-authors who cannot be here today. Yulan He, now at Aston Uiversity in the UK And Zhongyu Wei who joined us as an intern from the Chinese University of Hong Kong
  • They identified three challenges from a computational linguistics perspective. First, computational linguists would normally make use of a much more extensive annotated dataset. When carrying out text classification, they would normally be looking for subject-focused words and phrases. However, here we are looking for a way of talking that can be used in many contexts. However, we also need to take topical features into account. If students are supposed to be talking about learning analytics, we are not interested if they veer off to engage in exploratory dialogue about the microphones they are using to talk to each other, or about a novel they have just read.
  • Our limited annotated dataset meant it would be necessary to develop some sort of self-training system, that could learn from the annotated dataset and begin to do its own annotations. A straightforward way of doing this would be to use labelled instances and to tell the classifier - here are examples that we have labelled as ‘exploratory dialogue’ go and find ones like them, and label those as exploratory The problem is that if the classifier makes mistakes they begin to multiply because the instances that are given pseudo labels by the classifier are added to the dataset and used to help to classify other turns in the dialogue
  • So, instead of focusing on labelled instances, we focused on labelled features. Each turn in the dialogue – every contribution to the Elluminate discussion – was broken down into unigrams, bigrams and trigrams. If enough of these were associated with exploratory dialogue, that turn in the dialogue would be given an ‘exploratory’ pseudo label (a pseudo label is a temporary label, assigned by the classifier). On the other hand, if most features were associated with non-exploratory dialogue, then it would be given a ‘non-exploratory’ pseudo label. This was a more detailed series of checks than labelling by instance. We wanted to go further, and check against context. We did that by taking ‘nearest neighbours’ into account
  • Checking against nearest neighbours allowed us to take context into account. In this case, ‘nearest neighbour’ doesn’t mean nearest in time, it means the turns in the dialogue that were the most similar in terms of features – those sharing lots of common words. So the classifier compared the pseudo label of a turn in the dialogue with that of its three nearest neighbours, and it took into account how confident it had been about assigning those labels
  • The classifier used this information to calculate a support value for the pseudo label – based on its decisions about k nearest neighbours. It did this using this formula – first checking whether labels were the same, then checking confidence values, and then averaging them out. Let’s look at an example with some straightforward numbers plugged into it.
  • So we are checking the label on the left. This turn in the dialogue has tentatively been labelled ‘non-exploratory’ on the basis of its features. The next stage is to check it against some of its nearest neigbours. The classifier checks it against k nearest neighbours, and in this case I’m taking k to be 3. The classifier uses information about those 3 nearest neighbours to calculate a support value ( s ) for that label. It then compares the support value with a cut-off level we have chosen. The cut-off value is R and, in this case, we have set R to be 0.5. So if the support value is greater than 0.5 the classifier can label this turn in the dialogue non-exploratory. It looks good to start with. Two out of three nearest neighbours have the same label. Let’s work through in more detail. Nearest neighbour 1 has the same label as the turn we are checking, so we take its confidence level. Nearest neighbour has a different label, so that’s a 0. Nearest neighbour 3 has the same label as the turn we are checking, so we take its confidence level. We add up those figures, and then divide by the number of nearest neighbours that we have taken into account – in this case, three. The result is 0.333. That is below our cut-off figure of 0.5, so the classifier cannot be confident that the pseudo-label is accurate.
  • In order to improve our classifier further, we returned to the manually coded data from our pilot study. The 94 cue phrases identified there had proved to be precise – if they were present, that turn in the dialogue was almost certainly exploratory. However, they had low recall – they were not present in a lot of exploratory turns in the dialogue. We incorporated these phrases into our classifier to improve accuracy
  • Having developed our method, we needed a dataset to test it on. We were able to use the annotated dataset from our pilot study. This was the Elluminate text chat from a two-day online conference. We extended the dataset by adding text chat from three MOOCs. Each contribution was taken as a turn in the dialogue – so a turn might be a few words, or just an emoticon, or it might be several sentences. The use of non-standard spelling, grammar and punctuation in chat like this makes it difficult to classify using methods developed for formally structured text.
  • We asked two postgraduate researchers to classify a portion of the unannotated data – picking out any turns that they considered to be exploratory. This is a cut-down version of the coding instructions we gave them. At this stage, we were hoping to be able to do a more fine-grained coding – breaking exploratory dialogue down into challenge, evaluation, extension and reasoning. Those sub-categories did not prove to be reliable. There were several reasons for this. It was partly because they overlap – after all, if you explain something, you also extend it. It was partly because the nature of synchronous chat means it is not always clear who is replying to whom. And it was partly because turns in the dialogue changed their state as the dialogue developed. What was originally a throwaway remark might be picked up by someone else and incorporated within their line of reasoning. So they sub-categories did not prove useful. But there was sufficient agreement on what was definitely exploratory and what was definitely non-exploratory for us to use these to extend our annotated database. To increase reliability and validity, we only built out annotated databased from those turns in the dialogue on which both coders had agreed.
  • We then combined methods in order to classify the unannotated data We experimented with different approaches – ranging from the simple cue-phrase labelling of our pilot, to a comination of all the methods, as detailed above. In each run of the experiment, one of the four sections of the annotated dataset OUC2010 was used as a test set, All or part of the remaining annotated dataset was used to train the classifier. The un-annotated dataset was used for self-training. In order to evaluate performance, all possible training/testing combinations were tested, and the results of these runs were averaged.
  • The full results are included in our paper. For the sake of clarity I am just reporting two approaches here – the original, pilot approach that used cue phrases, and our proposed method using features, cue phrases and nearest neigbours. The cue phrases identified in the pilot, as we expected proved to be the most precise. However, we knew this was not enough in itself, because that method missed lots of exploratory dialogue and had very low recall. Overall, our proposed framework was the best, and it performed significantly better than the cue-phrase-only method used in our pilot
  • We experimented with different values for k, deciding how many nearest neighbours it was best to take into account. Our results showed that it was best to take three nearest neighbours into account.
  • So, once we have a method of detecting exploratory dialogue in synchronous text chat, what can we do with it? One possibility is to use it to provie a way of navigating long sections of dialogue. For example, an online conference session could last for hours – this would provide a way of focusing on particularly interesting sections. It could also be used to highlight patterns of dialogue to learners or to educators. This slide shows a visualisation of a conference session that lasts about 150 minutes. Rather than represent every turn in the dialogue, we have given an average for each block of 10 contributions. Youll see that people are very chatty at the end of the session – 20 turns in the dialogue in a couple of minutes – but they’re not exploratory. Same at the beginning, when they are all saying ‘Hello’ However, between 11.20am and noon, they engage in an extended period of exploraotry dialogue that builds to a peak at about 10 to 12, and then dies away as the session comes to an end
  • Another approach is to look in more detail at what individuals are doing. Each diamond here represents an individual, and the graph plots total number of turns against number of exploratory turns. In this case we have added a line that highlights those whose dialogue is most consistently exploratory. Individuals on that line are engaging in a lot of exploratory dialogue. This could be used to provide focused support to users However, this would need to be done carefully. The individual who has conttibuted 15 times without being exploratory might need support in developing their learning dialogue – or they may be complaining persistenlty that the sound is down and they can’t hear anything, or they may have taken on the role of welcoming people to the conference
  • This takes us on to some associated issues. The diagrams we have shown here take some explaining – they probably weren’t clear straight away to anyone in the room So we have to think about ways of presenting information clearly. We have developed these analytics to support learners – but there are issues of presentation here Our aim is not to make the person ringed in green feel smug, and the person ringed in red feel discouraged So we need to develop use cases and consider how these analytics can best be used And we see the potential for participatory design. Involving educators and learners in considering the data and what should be presented could help them with their learning and teaching. So there is plenty of scope for working with new groups of people to develop this work.
  • In the meantime, a brief reflection on working in the middle space. We were linking several very different traditions here – right at the point where learning meets analytics. It has brought us together across disciplines, and it has brought Zhongyu / Joey halfway round the world to work with us. It has only been through presenting and explaining our ideas to each other that we have come to be clear about what we are doing, how and why. I wanted to share here a sense of what this presentation looked like only a few days ago – full of questions and responses. I’ve found it encouraging to find what we can achieve by being cross-disciplinary, and I think it isn’t just about doing the research, it’s also about finding and developing a shared language for doing the research, and it’s about understanding that things that seem basic and obvious in one discipline, seem complex and difficult in another. Also need to be able to present to an audience from different disciplines
  • In conclusion – this is a summary of what we have done
  • The four of us worked together on this. If you would like to see a presentation on this work with more of an emphasis on computational linguistics, on the equations and the methods, I encourage you to watch the webinar that Zhongyu recorded as part of the SoLAR Storm doctoral series. Simon and I are both here and happy to talk about this work – if you have technical questions that we can’t answer then Zhongyu and Yulan are the people to contact
  • Learning analytics to identify exploratory dialogue in online discussions

    1. 1. An Evaluation of LearningAnalytics To Identify ExploratoryDialogue in Online DiscussionsRebecca Ferguson, The Open University, UKZhongyu Wei, The Chinese University of Hong KongYulan He, Aston University, UKSimon Buckingham Shum, The Open University, UK
    2. 2. Discourse analyticsThe ways in which learners engage indialogue indicate how they engage withthe ideas of others, how they relatethose ideas to their understanding andhow they explain their own point of view.• Disputational dialogue• Cumulative dialogue• Exploratory dialogue
    3. 3. Exploratory dialogue Category Indicator Challenge But if, have to respond, my view Critique However, I’m not sure, maybe Discussion of resources Have you read, more links Evaluation Good example, good point Explanation Means that, our goals Explicit reasoning Next step, relates to, that’s why Justification I mean, we learned, we observed Reflections of Agree, here is another, makes the point, perspectives of others take your point, your view
    4. 4. Pilot study: LAK 2011 Time Contribution 2:42 PM I hate talking. :-P My question was whether "gadgets" were just basically widgets and we could embed them in various web sites, like Netvibes, Google Desktop, etc. 2:42 PM Thanks, thats great! I am sure I understood everything, but looks inspiring! 2:43 PM Yes why OU tools not generic tools? 2:43 PM Issues of interoperability 2:43 PM The "new" SocialLearn site looks a lot like a corkboard where you can add various widgets, similar to those existing web start pages. 2:43 PM What if we end up with as many apps/gadgets as we have social networks and then we need a recommender for the apps! 2:43 PM My question was on the definition of the crowd in the wisdom of crowds we acsess in the service model? 2:43 PM there are various different flavours of widget e.g. Google gadgets, W3C widgets etc. SocialLearn has gone for Google gadgets
    5. 5. Computational linguistics Interdisciplinary field that deals with statistical and rule-based modelling of natural language from a computational perspectiveZhongyu Wei Yulan He
    6. 6. Three challenges 1. The annotated dataset is limited 2. Text classification problems are typically topic driven – this is not 3. Nevertheless, both dialogue features and topical features need to be taken into account
    7. 7. Self-training from labelledinstances – a problem Pseudo-label ✓ Exploratory Pseudo-label Exploratory ✓ Pseudo-label ✓ Exploratory Including this Pseudo-label ✗ instance would Exploratory degrade the classifier
    8. 8. Self-trainingfrom labelled features• For each turn in the dialogue, consider each unigram (word), bigram (2 words) and trigram (3 words)• Exploratory or non-exploratory?• Take into account word-association probabilities averaged over many pseudo-labelled examplesPseudo-labelN o n ­exploratory Bigrams ✓ Focusing on To improve labelling, take into account the features gives classification of a number (k) of the nearest a more reliable neighbours of that turn in the dialogue classification
    9. 9. Taking context into accountUnlabelled turn in the dialogue, p1 Nearest neighbour pni1 Pseudo label lni1 Confidence level cni1 Nearest neighbour pni2 Pseudo label lni2Pseudo label for that turn, l1 Confidence level cni2 Pseudo-label Nearest neighbour pni3 Non­ Pseudo label lni3 exploratory Confidence level cni3Confidence value for that label, c1 Let k = 3 (look at 3 nearest0.272727271 neighbours)
    10. 10. Checking against context Pseudo-label based on features is considered correct if support value (s) Pseudo-label ? based on context is high enough Non­ exploratory Support value is calculated by taking into account the pseudo labels and confidence values of k nearest neigbours
    11. 11. Checking the pseudo-labels Pseudo-label N on ­ exploratory Nearest neighbour 1 Pseudo-label N o n ­exploratory ? Confidence level 0.333 s = 0.333 + 0 + 0.666 Pseudo-label Exploratory 3If the support valuefor this pseudo label s = 0.333is greater than R then Nearest neighbour 2this turn in the dialogue Confidence level 0.999can be labelled Pseudo-label Because s < R‘non-exploratory’ N on ­ exploratory this turn in the dialogueLet R = 0.5 should not be labelled Nearest neighbour 3 ‘non-exploratory’ Confidence level 0.666
    12. 12. Cue phrases from pilotAgree Does this suggestAlso Draft EvidenceAlthough Example 94 cue phrasesAlternativeAny research Except •Precise but MisunderstandingAre we Good example •Low recallBecauseBut if Good point Good thing aboutChallenge Have we Used to improveClaimDebate Have you looked at accuracy when Have you readDefinitely Here is another classifyingDependsDifficult How are unannotated datasetDiscussion How canDo we have [...]Do you WhyDoes that mean Your view
    13. 13. DatasetTime Contribution Annotated •Elluminate text chat2:43 Issues of interoperabilityPM •Two-day conference2:43 The "new" SocialLearn site looks a lot like a •2,636 dialogue turnsPM corkboard where you can add various widgets, •Mean word tokens similar to those existing web start pages. per turn: 10.142:43 What if we end up with as many apps/gadgets as UnannotatedPM we have social networks and then we need a recommender for the apps! •Elluminate text chat •Three MOOCs •10,568 dialogue turns2:43 My question was on the definition of the crowd in •Mean word tokensPM the wisdom of crowds we acsess in the service model? per turn: 9.24
    14. 14. Manual coding of data subsetCategory Description Examples includeChallenge A challenge identifies Calling into question something that may be Contradicting wrong and in need of Proposing revision correctionEvaluation An evaluation has a Appraising descriptive quality Assessing JudgingExtension An extension builds on, or Applying idea to a new area provides resources that Increasing range of an idea support, discussion Providing related resourcesReasoning Reasoning is the process of Explaining thinking an idea through Justifying your position Reaching a conclusion
    15. 15. Combining methods• Train initial classifier on annotated dataset• Apply trained classifier to un-annotated data• Use self-learned features to find exploratory dialogue• Use cue-phrase matching to improve accuracy• Take context into account using k-nearest neighbours• Add selected instances to the training dataset• Repeat for five iterations or until less than 0.5% of labels are changed
    16. 16. Evaluation criteriaOn a scale of 0 to 1…AccuracyHow many decisions were correct?Pilot 0.5389 SF+CP+KNN = 0.7924PrecisionHow many ‘exploratory’ turns were actually exploratory?Pilot 0.9523 SF+CP+KNN = 0.8083RecallHow many exploratory turns were classified as exploratory?Pilot 0.4241 SF+CP+KNN = 0.8688F1Weighted average of precision and recallPilot 0.5865 SF+CP+KNN = 0.8331
    17. 17. Varying the value of k k Accuracy Precision Recall F1 1 0.7868 0.8007 0.8666 0.8282 3 0.7924 0.8083 0.8688 0.8331 5 0.7881 0.8005 0.8685 0.8292 7 0.7586 0.7505 0.8640 0.8001 Looking at three nearest neighbours gives best results
    18. 18. Making use of the classifier Each colour block represents 10 turns in the dialogueRed blocks are primarily exploratory, blue blocks primarily non-exploratory
    19. 19. Making use of the classifierExploratory turns in the dialogue The line here is set to highlight anyone who had more than 5/6 of their turns classified as exploratory Analytics like these could be used to provide focused support to learners Total turns in the dialogue
    20. 20. Issues Visual literacy How can we share the maximumExploratory turns in the dialogue amount of information while making these analytics easy to use? Assessment for learning How can we use these analytics to motivate and guide, rather than to discourage? Participatory design Total turns in the dialogue How can we involve learners and teachers in learning discussions around these analytics?
    21. 21. Working in the middle space
    22. 22. Conclusion• We proposed and tested a self-training framework• Found it out-performs alternative methods of detecting exploratory dialogue• Developed an annotated corpus for the development of automatic exploratory dialogue detection• Identified areas for future research• Identified ways of applying this work to support learners and educators
    23. 23. SoLAR Storm webinar bit.ly/YSEVHG Yulan HeSenior Lecturer at the School of Engineering and Appied Science, Aston University, UK