Social learning analytics are concerned with the process of knowledge construction as learners build knowledge together in their social and cultural environments. One of the most important tools employed during this process is language. In this presentation we take exploratory dialogue, a joint form of co-reasoning, to be an external indicator that learning is taking place. Using techniques developed within the field of computational linguistics, we build on previous work using cue phrases to identify exploratory dialogue within online discussion.
Capitol Tech U Doctoral Presentation - April 2024.pptx
Learning analytics to identify exploratory dialogue in online discussions
1. An Evaluation of Learning
Analytics To Identify Exploratory
Dialogue in Online Discussions
Rebecca Ferguson, The Open University, UK
Zhongyu Wei, The Chinese University of Hong Kong
Yulan He, Aston University, UK
Simon Buckingham Shum, The Open University, UK
2. Discourse analytics
The ways in which learners engage in
dialogue indicate how they engage with
the ideas of others, how they relate
those ideas to their understanding and
how they explain their own point of view.
• Disputational dialogue
• Cumulative dialogue
• Exploratory dialogue
3. Exploratory dialogue
Category Indicator
Challenge But if, have to respond, my view
Critique However, I’m not sure, maybe
Discussion of resources Have you read, more links
Evaluation Good example, good point
Explanation Means that, our goals
Explicit reasoning Next step, relates to, that’s why
Justification I mean, we learned, we observed
Reflections of Agree, here is another, makes the point,
perspectives of others take your point, your view
4. Pilot study: LAK 2011
Time Contribution
2:42 PM I hate talking. :-P My question was whether "gadgets" were
just basically widgets and we could embed them in various
web sites, like Netvibes, Google Desktop, etc.
2:42 PM Thanks, that's great! I am sure I understood everything, but
looks inspiring!
2:43 PM Yes why OU tools not generic tools?
2:43 PM Issues of interoperability
2:43 PM The "new" SocialLearn site looks a lot like a corkboard
where you can add various widgets, similar to those existing
web start pages.
2:43 PM What if we end up with as many apps/gadgets as we have
social networks and then we need a recommender for the
apps!
2:43 PM My question was on the definition of the crowd in the wisdom
of crowds we acsess in the service model?
2:43 PM there are various different flavours of widget e.g. Google
gadgets, W3C widgets etc. SocialLearn has gone for Google
gadgets
5. Computational linguistics
Interdisciplinary field that
deals with statistical
and rule-based modelling
of natural language from a
computational perspective
Zhongyu Wei Yulan He
6. Three challenges
1. The annotated dataset is limited
2. Text classification problems are
typically topic driven – this is not
3. Nevertheless, both dialogue
features and topical features need
to be taken into account
7. Self-training from labelled
instances – a problem
Pseudo-label ✓
Exploratory
Pseudo-label
Exploratory ✓
Pseudo-label ✓
Exploratory
Including this
Pseudo-label ✗ instance would
Exploratory
degrade the
classifier
8. Self-training
from labelled features
• For each turn in the dialogue, consider each unigram
(word), bigram (2 words) and trigram (3 words)
• Exploratory or non-exploratory?
• Take into account word-association probabilities
averaged over many pseudo-labelled examples
Pseudo-label
N o n exploratory Bigrams
✓
Focusing on To improve labelling, take into account the
features gives classification of a number (k) of the nearest
a more reliable neighbours of that turn in the dialogue
classification
9. Taking context into account
Unlabelled turn in the dialogue, p1 Nearest neighbour pni1
Pseudo label lni1
Confidence level cni1
Nearest neighbour pni2
Pseudo label lni2
Pseudo label for that turn, l1 Confidence level cni2
Pseudo-label Nearest neighbour pni3
Non Pseudo label lni3
exploratory
Confidence level cni3
Confidence value for that label, c1 Let k = 3
(look at 3 nearest
0.272727271 neighbours)
10. Checking against context
Pseudo-label based on features is
considered correct if support value (s)
Pseudo-label
? based on context is high enough
Non
exploratory
Support value is calculated by taking into account the
pseudo labels and confidence values of k nearest neigbours
11. Checking the pseudo-labels
Pseudo-label
N on
exploratory
Nearest neighbour 1
Pseudo-label
N o n exploratory
? Confidence level 0.333
s = 0.333 + 0 + 0.666
Pseudo-label
Exploratory
3
If the support value
for this pseudo label s = 0.333
is greater than R then Nearest neighbour 2
this turn in the dialogue Confidence level 0.999
can be labelled
Pseudo-label
Because s < R
‘non-exploratory’ N on
exploratory
this turn in the dialogue
Let R = 0.5 should not be labelled
Nearest neighbour 3 ‘non-exploratory’
Confidence level 0.666
12. Cue phrases from pilot
Agree Does this suggest
Also Draft
Evidence
Although
Example
94 cue phrases
Alternative
Any research Except •Precise but
Misunderstanding
Are we
Good example
•Low recall
Because
But if Good point
Good thing about
Challenge
Have we
Used to improve
Claim
Debate Have you looked at accuracy when
Have you read
Definitely
Here is another
classifying
Depends
Difficult How are unannotated dataset
Discussion How can
Do we have [...]
Do you Why
Does that mean Your view
13. Dataset
Time Contribution Annotated
•Elluminate text chat
2:43 Issues of interoperability
PM •Two-day conference
2:43 The "new" SocialLearn site looks a lot like a •2,636 dialogue turns
PM corkboard where you can add various widgets, •Mean word tokens
similar to those existing web start pages.
per turn: 10.14
2:43 What if we end up with as many apps/gadgets as Unannotated
PM we have social networks and then we need a
recommender for the apps!
•Elluminate text chat
•Three MOOCs
•10,568 dialogue turns
2:43 My question was on the definition of the crowd in •Mean word tokens
PM the wisdom of crowds we acsess in the service
model? per turn: 9.24
14. Manual coding of data subset
Category Description Examples include
Challenge A challenge identifies Calling into question
something that may be Contradicting
wrong and in need of Proposing revision
correction
Evaluation An evaluation has a Appraising
descriptive quality Assessing
Judging
Extension An extension builds on, or Applying idea to a new area
provides resources that Increasing range of an idea
support, discussion Providing related resources
Reasoning Reasoning is the process of Explaining
thinking an idea through Justifying your position
Reaching a conclusion
15. Combining methods
• Train initial classifier on annotated dataset
• Apply trained classifier to un-annotated data
• Use self-learned features to find exploratory dialogue
• Use cue-phrase matching to improve accuracy
• Take context into account using k-nearest neighbours
• Add selected instances to the training dataset
• Repeat for five iterations
or until less than 0.5% of labels are changed
16. Evaluation criteria
On a scale of 0 to 1…
Accuracy
How many decisions were correct?
Pilot 0.5389 SF+CP+KNN = 0.7924
Precision
How many ‘exploratory’ turns were actually exploratory?
Pilot 0.9523 SF+CP+KNN = 0.8083
Recall
How many exploratory turns were classified as exploratory?
Pilot 0.4241 SF+CP+KNN = 0.8688
F1
Weighted average of precision and recall
Pilot 0.5865 SF+CP+KNN = 0.8331
17. Varying the value of k
k Accuracy Precision Recall F1
1 0.7868 0.8007 0.8666 0.8282
3 0.7924 0.8083 0.8688 0.8331
5 0.7881 0.8005 0.8685 0.8292
7 0.7586 0.7505 0.8640 0.8001
Looking at three nearest neighbours gives best results
18. Making use of the classifier
Each colour block represents 10 turns in the dialogue
Red blocks are primarily exploratory, blue blocks primarily non-exploratory
19. Making use of the classifier
Exploratory turns in the dialogue
The line here is set
to highlight anyone
who had more than
5/6 of their turns
classified as
exploratory
Analytics like these
could be used to
provide focused
support to learners
Total turns in the dialogue
20. Issues
Visual literacy
How can we share the maximum
Exploratory turns in the dialogue
amount of information while making
these analytics easy to use?
Assessment for learning
How can we use these analytics
to motivate and guide, rather than to
discourage?
Participatory design
Total turns in the dialogue
How can we involve learners and
teachers in learning discussions
around these analytics?
22. Conclusion
• We proposed and tested a self-training framework
• Found it out-performs alternative methods of
detecting exploratory dialogue
• Developed an annotated corpus for the development
of automatic exploratory dialogue detection
• Identified areas for future research
• Identified ways of applying this work to support
learners and educators
23. SoLAR Storm webinar
bit.ly/YSEVHG
Yulan He
Senior Lecturer at the School of Engineering
and Appied Science, Aston University, UK
Editor's Notes
Who we are Where we are from What this presentation is about
Why is learning dialogue important? From a sociocultural perspective, language and dialogue are crucial tools in the development of knowledge The work of Neil Mercer and his colleagues has shown that effective dialogue can be taught, and can significantly improve results
The three modes of dialogue in learning situations Brief summary of disputational dialogue and cumulative dialogue Exploratory dialogue – and phrases that might signal its presence
Example of coding from the pilot study reported at LAK 2011 The analysis pointed to differences between conference participants, and picked out people who seemed likely to be actively engaged in knowledge building. The analysis also discriminated between conference sessions. This work gave us some key phrases, but it required a lot of manual cleansing of data, and use of a Find/Replace program based on cue phrases
So we joined up with computationa linguists – our co-authors who cannot be here today. Yulan He, now at Aston Uiversity in the UK And Zhongyu Wei who joined us as an intern from the Chinese University of Hong Kong
They identified three challenges from a computational linguistics perspective. First, computational linguists would normally make use of a much more extensive annotated dataset. When carrying out text classification, they would normally be looking for subject-focused words and phrases. However, here we are looking for a way of talking that can be used in many contexts. However, we also need to take topical features into account. If students are supposed to be talking about learning analytics, we are not interested if they veer off to engage in exploratory dialogue about the microphones they are using to talk to each other, or about a novel they have just read.
Our limited annotated dataset meant it would be necessary to develop some sort of self-training system, that could learn from the annotated dataset and begin to do its own annotations. A straightforward way of doing this would be to use labelled instances and to tell the classifier - here are examples that we have labelled as ‘exploratory dialogue’ go and find ones like them, and label those as exploratory The problem is that if the classifier makes mistakes they begin to multiply because the instances that are given pseudo labels by the classifier are added to the dataset and used to help to classify other turns in the dialogue
So, instead of focusing on labelled instances, we focused on labelled features. Each turn in the dialogue – every contribution to the Elluminate discussion – was broken down into unigrams, bigrams and trigrams. If enough of these were associated with exploratory dialogue, that turn in the dialogue would be given an ‘exploratory’ pseudo label (a pseudo label is a temporary label, assigned by the classifier). On the other hand, if most features were associated with non-exploratory dialogue, then it would be given a ‘non-exploratory’ pseudo label. This was a more detailed series of checks than labelling by instance. We wanted to go further, and check against context. We did that by taking ‘nearest neighbours’ into account
Checking against nearest neighbours allowed us to take context into account. In this case, ‘nearest neighbour’ doesn’t mean nearest in time, it means the turns in the dialogue that were the most similar in terms of features – those sharing lots of common words. So the classifier compared the pseudo label of a turn in the dialogue with that of its three nearest neighbours, and it took into account how confident it had been about assigning those labels
The classifier used this information to calculate a support value for the pseudo label – based on its decisions about k nearest neighbours. It did this using this formula – first checking whether labels were the same, then checking confidence values, and then averaging them out. Let’s look at an example with some straightforward numbers plugged into it.
So we are checking the label on the left. This turn in the dialogue has tentatively been labelled ‘non-exploratory’ on the basis of its features. The next stage is to check it against some of its nearest neigbours. The classifier checks it against k nearest neighbours, and in this case I’m taking k to be 3. The classifier uses information about those 3 nearest neighbours to calculate a support value ( s ) for that label. It then compares the support value with a cut-off level we have chosen. The cut-off value is R and, in this case, we have set R to be 0.5. So if the support value is greater than 0.5 the classifier can label this turn in the dialogue non-exploratory. It looks good to start with. Two out of three nearest neighbours have the same label. Let’s work through in more detail. Nearest neighbour 1 has the same label as the turn we are checking, so we take its confidence level. Nearest neighbour has a different label, so that’s a 0. Nearest neighbour 3 has the same label as the turn we are checking, so we take its confidence level. We add up those figures, and then divide by the number of nearest neighbours that we have taken into account – in this case, three. The result is 0.333. That is below our cut-off figure of 0.5, so the classifier cannot be confident that the pseudo-label is accurate.
In order to improve our classifier further, we returned to the manually coded data from our pilot study. The 94 cue phrases identified there had proved to be precise – if they were present, that turn in the dialogue was almost certainly exploratory. However, they had low recall – they were not present in a lot of exploratory turns in the dialogue. We incorporated these phrases into our classifier to improve accuracy
Having developed our method, we needed a dataset to test it on. We were able to use the annotated dataset from our pilot study. This was the Elluminate text chat from a two-day online conference. We extended the dataset by adding text chat from three MOOCs. Each contribution was taken as a turn in the dialogue – so a turn might be a few words, or just an emoticon, or it might be several sentences. The use of non-standard spelling, grammar and punctuation in chat like this makes it difficult to classify using methods developed for formally structured text.
We asked two postgraduate researchers to classify a portion of the unannotated data – picking out any turns that they considered to be exploratory. This is a cut-down version of the coding instructions we gave them. At this stage, we were hoping to be able to do a more fine-grained coding – breaking exploratory dialogue down into challenge, evaluation, extension and reasoning. Those sub-categories did not prove to be reliable. There were several reasons for this. It was partly because they overlap – after all, if you explain something, you also extend it. It was partly because the nature of synchronous chat means it is not always clear who is replying to whom. And it was partly because turns in the dialogue changed their state as the dialogue developed. What was originally a throwaway remark might be picked up by someone else and incorporated within their line of reasoning. So they sub-categories did not prove useful. But there was sufficient agreement on what was definitely exploratory and what was definitely non-exploratory for us to use these to extend our annotated database. To increase reliability and validity, we only built out annotated databased from those turns in the dialogue on which both coders had agreed.
We then combined methods in order to classify the unannotated data We experimented with different approaches – ranging from the simple cue-phrase labelling of our pilot, to a comination of all the methods, as detailed above. In each run of the experiment, one of the four sections of the annotated dataset OUC2010 was used as a test set, All or part of the remaining annotated dataset was used to train the classifier. The un-annotated dataset was used for self-training. In order to evaluate performance, all possible training/testing combinations were tested, and the results of these runs were averaged.
The full results are included in our paper. For the sake of clarity I am just reporting two approaches here – the original, pilot approach that used cue phrases, and our proposed method using features, cue phrases and nearest neigbours. The cue phrases identified in the pilot, as we expected proved to be the most precise. However, we knew this was not enough in itself, because that method missed lots of exploratory dialogue and had very low recall. Overall, our proposed framework was the best, and it performed significantly better than the cue-phrase-only method used in our pilot
We experimented with different values for k, deciding how many nearest neighbours it was best to take into account. Our results showed that it was best to take three nearest neighbours into account.
So, once we have a method of detecting exploratory dialogue in synchronous text chat, what can we do with it? One possibility is to use it to provie a way of navigating long sections of dialogue. For example, an online conference session could last for hours – this would provide a way of focusing on particularly interesting sections. It could also be used to highlight patterns of dialogue to learners or to educators. This slide shows a visualisation of a conference session that lasts about 150 minutes. Rather than represent every turn in the dialogue, we have given an average for each block of 10 contributions. Youll see that people are very chatty at the end of the session – 20 turns in the dialogue in a couple of minutes – but they’re not exploratory. Same at the beginning, when they are all saying ‘Hello’ However, between 11.20am and noon, they engage in an extended period of exploraotry dialogue that builds to a peak at about 10 to 12, and then dies away as the session comes to an end
Another approach is to look in more detail at what individuals are doing. Each diamond here represents an individual, and the graph plots total number of turns against number of exploratory turns. In this case we have added a line that highlights those whose dialogue is most consistently exploratory. Individuals on that line are engaging in a lot of exploratory dialogue. This could be used to provide focused support to users However, this would need to be done carefully. The individual who has conttibuted 15 times without being exploratory might need support in developing their learning dialogue – or they may be complaining persistenlty that the sound is down and they can’t hear anything, or they may have taken on the role of welcoming people to the conference
This takes us on to some associated issues. The diagrams we have shown here take some explaining – they probably weren’t clear straight away to anyone in the room So we have to think about ways of presenting information clearly. We have developed these analytics to support learners – but there are issues of presentation here Our aim is not to make the person ringed in green feel smug, and the person ringed in red feel discouraged So we need to develop use cases and consider how these analytics can best be used And we see the potential for participatory design. Involving educators and learners in considering the data and what should be presented could help them with their learning and teaching. So there is plenty of scope for working with new groups of people to develop this work.
In the meantime, a brief reflection on working in the middle space. We were linking several very different traditions here – right at the point where learning meets analytics. It has brought us together across disciplines, and it has brought Zhongyu / Joey halfway round the world to work with us. It has only been through presenting and explaining our ideas to each other that we have come to be clear about what we are doing, how and why. I wanted to share here a sense of what this presentation looked like only a few days ago – full of questions and responses. I’ve found it encouraging to find what we can achieve by being cross-disciplinary, and I think it isn’t just about doing the research, it’s also about finding and developing a shared language for doing the research, and it’s about understanding that things that seem basic and obvious in one discipline, seem complex and difficult in another. Also need to be able to present to an audience from different disciplines
In conclusion – this is a summary of what we have done
The four of us worked together on this. If you would like to see a presentation on this work with more of an emphasis on computational linguistics, on the equations and the methods, I encourage you to watch the webinar that Zhongyu recorded as part of the SoLAR Storm doctoral series. Simon and I are both here and happy to talk about this work – if you have technical questions that we can’t answer then Zhongyu and Yulan are the people to contact