From Data to Design of Dynamic Support

Carolyn Penstein Rosé
cprose@cs.cmu.edu
http://www.cs.cmu.edu/~cprose
http://dance.cs.cmu.edu
Language Technologies Institute
and Human-Computer Interaction Institute
Carnegie Mellon University
From data to design
of Dynamic Support for Collaborative Learning

What do we want to enable?
• Project and Problem based learning
• Collaborative reflection and Collaborative Problem Solving
• Community building and social support
– Reducing attrition in MOOCsGateways to enduring communities of practice
– Bridging learning and practice
2

Effective
Collaborative Learning
is rare
without support

Conversational Agent Based Support in
Computer Supported Collaborative Learning
Students learn 1.24 s.d. more when working with a partner and automated support than
students working alone (Kumar et al., 2007)

Technology Support for Collaborative
Learning
Automatic
Analysis
Of
Conversation
Conversational
Interventions
Positive
Learning
Outcomes

Outline
• Process Overview
• Theoretical Framing for Discourse Analysis
• Motivating design through corpus studies
• Zooming in on Text Analytic Process and
Technology Support
• Tools and Resources

Analysis Design
Facilitating
Effect
On
Teachers
Automated
Support
For
Students

Analysis Design
Automated
Support
For
Students

A Common Representation of
Discourse

Data Analytics Pipeline
• Browse data in DiscourseDB
• Import/Export data
• View, manipulate, create Annotations
• http://brat.nlplab.org
• Use annotations on DiscourseDB data to
train models.
• Use models to annotate DiscourseDB data
• http://ankara.lti.cs.cmu.edu/side
12

http://dance.cs.cmu.edu
Check it out!

Automated Essay Scoring as an Example
• Earliest approaches to automated essay scoring (Page,
1966)
• Work towards greater validity (Shermis & Burstein, 2003)
• Work towards triggering feedback from assessment
(Wade-Stein & Kintch, 2004)
• Competition across industry and academia (Shermis &
Hammer, 2012)
– Free off-the-shelf approach just as good as industry options
– LightSIDE (Mayfield & Rosé, 2013)
15

What did we learn?
• Simple features can make accurate predictions
– Word features, sentence length, rare word count, etc.
– They function as proxies for more meaningful features
• E.g., longer sentences may have more elaboration, so
sentence length might correlate with amount of elaboraton
• Models trained with simple features are
problematic
– They don’t transfer well between contexts
– They are not useful for instructions

Theory
InterpretationResearch Questions
PatternsData
Methodology

Theoretical Framework
Psychology
• Basic concepts of power and social
distance explain social processes
operating in interactions

Psychology
Sociolinguistics
• Social processes are reflected
through patterns of language
variation

Psychology
Sociolinguistics
Language
Technologies
variation
• If we understand this connection,
we can model language more
effectively

Language
Technologies
variation
• If we understand this connection,
we can model language more
effectively
• Models that embody these
structures will be able to predict
social processes from interaction
data
Psychology
Sociolinguistics

Authority
Authority
Engagement Engagement
Souflé Framework
(Howley et al., 2013)
22
Transactive
Knowledge Integration
Person Person

Souflé Framework
24
Person Person

Souflé Framework
25
Transactive
Person Person

Power, Relationships, and Transactivity
Cognitive
Factors
Social
Factor
s

Power, Relationships, and Transactivity
1963 1983 1986 1993
Piaget Berkowitz
& Gibbs
Kruger &
Tomasello
Azmitia
& Montgomery
Power,
Cognitive Conflict,
And Learning
Socio-
Cognitive
Conflict and
Transactivity
Power Balance
And
Transactivity
Friendship,
Transactivity,
And Learning

From the Social to the Cognitive
Findings from
Sociolinguistics
DBM
Model
Findings from
Developmental
Psychology
Model connecting
speech style
accommodation and
Transactivity
Jain et al., 2012 Gweon et al., 2012
Reflecting
Perspectives
And
Relationships
Reflecting
Evidence of
Consensus Building

Computational Modeling the Connection
between Cognitive and Social Factors
A speaker’s style depends both
on mutual accommodation and
the partner’s style in the
previous turn.
Accommodation states signal
that if accommodation is
happening, it is likely to persist
29
Measured accommodation through DBM predicts Prevalence of Transactivity.
Gweon, G., Jain, M., Mc Donough, J., Raj, B., Rosé, C. P. (2013). Measuring Prevalence of Other-Oriented Transactive Contributions Using an
Automated Measure of Speech Style Accommodation, International Journal of Computer Supported Collaborative Learning 8(2), pp 245-265.

R = 0.4
#OtherOriented
Transacts
Computational Modeling the Connection
between Cognitive and Social Factors

• Findings
– Moderating effect on learning (Joshi & Rosé, 2007; Russell, 2005; Kruger &
Tomasello, 1986; Teasley, 1995)
– Moderating effect on knowledge sharing in working groups (Gweon et al., 2011)
• Computational Work
– Can be automatically detected in:
• Threaded group discussions (Kappa .69) (Rosé et al., 2008)
• Transcribed classroom discussions (Kappa .69) (Ai et al., 2010)
• Speech from dyadic discussions (R = .37) (Gweon et al., 2012)
– Predictable from a measure of speech style accommodation computed by an
unsupervised Dynamic Bayesian Network (Jain et al., 2012)
Transactivity (Berkowitz & Gibbs, 1983)

Souflé Framework
32
Transactive
Person Person

• Findings
– Correlational analysis: Strong correlation between displayed
openness of group members and articulation of reasoning (R =
.72) (Dyke et al., in press)
– Intervention study: Causal effect on propensity to articulate ideas
in group chats (effect size .6 standard deviations) (Kumar et al.,
2011)
• Mediating effect of idea contribution on learning in scientific
inquiry (Wang et al., 2011)
Engagement (Martin & White, 2005)

Authority
Authority
Souflé Framework
34
Transactive
Person Person

Analysis of Authoritativess
Water pipe analogy:
Water = Knowledge or Action
Source = Authoritative speaker
Sink = Non-authoritative Speaker
Authoritativeness Ratio = Source Actions
Actions

The Negotiation Framework
(Martin & Rose, 2003)
Source
orSink?
Prim
ary
Secondary
Type
ofContent?
Know
ledge
Action
K2
requesting knowledge,
information, opinions, or facts
K1
giving knowledge, information,
opinions, or facts
A2
Instructing, suggesting, or
requesting non-verbal action
A1
Narrating or performing your
own non-verbal actionAdditionally…
ch (direct challenge to previous utterance)
o (all other moves, backchannels, etc.)
K1 + A1
K1 + K2 + A1 + A2
Authoritativeness:

Where did the Negotiation framework
come from?
Systemic Functional Linguistics: The study of how people talk to
each other (Martin, 2003)
Functional, rather than generative, approach to describing
language.
Negotiation framework has been used in sociolinguistic literature
since the 1980s
(e.g. Berry, 1981; Veel, 1999; Martin, 2008)

Source
orSink?
Prim
ary
Secondary
Type
ofContent?
Know
ledge
Action
K2
K1
opinions, or facts
A2
A1
own non-verbal action
Additionally…
ch (direct challenge to previous utterance)
o (all other moves, backchannels, etc.)

Source
orSink?
Prim
ary
Secondary
Type
ofContent?
Know
ledge
Action
K2
K1
opinions, or facts
A2
A1

• Findings
– Authoritativeness measures display how students respond to aggressive
behavior in groups (Howley et al., in press)
– Authoritativeness predicts learning (R = .64) and self-efficacy (R = .35) (Howley
et al., 2011)
– Authoritativeness predicts trust in doctor-patient interactions (R values
between .25 and .35) (Mayfield et al., under review)
• Computational Work
– Detectable in collaborative learning chat logs (R = .86)
– Detectable in transcribed dyadic discussions in a knowledge sharing task (R =
.95) (Mayfield & Rosé, 2011)
– Detectable in transcribed doctor-patient interactions (R = .96) (Mayfield et al.,
under review)
Authoritativeness (Martin & Rose, 2003)

Example: MathTalk
• Personalized Agent
condition vs
Control condition
• 30 6th
graders
– Randomly assigned to
pairs, conditions
• Procedure
46
Social Dialogue Agent Study (Kumar et al, 2007)
Day 1 Day 2
Lab session Lab session
Pretest Quiz Posttest Quiz,
Questionnaire

Main Results:
Advantage for Social Condition
• Significant increase in perception of amount of
help given and received
– Significant increase in amount of help given per problem
(Gweon et al., 2007)
– Students marginally more likely to complete a step on their own
after receiving help (Cui et al., 2009)
• Marginally higher learning gains
• But why?
48
[Kumar et al, 2007]

Understanding the Effect of Social Climate
on Positioning and Risk Taking
• Coded chat logs for instances of aggressive
behavior
– Pushy behavior
– Insults
• Coded for Negotiation (especially K1 and K2)
– Based on counts of K1 and K2, computed an authoritativeness
score for each student per lab day
• K1/[K1 + K2]
– Computed a Shift score per student
• Residual from linear regression predicting Day 2
authoritativeness from Day 1 authoritativeness
– Binary Shift variable (within pair, which student shifted up to a
more authoritative stance versus shifted down)

Aggressive Behavior
• Significantly more aggressive behavior
in Control condition
– F(1,56) = 8.93, p < .005 **, effect size .63σ
• Significantly more aggressive behavior
on Day 2
– F(1,56) = 15.61, p < .0005 **, effect size .87σ
– Significant interaction with Condition
• F(1,56) = 6.06, p < .05 **
• Only significant increase in aggressive
behavior on Day 2 in the Control
condition
• In each pair, identified student with
higher amount of aggressive behavior
on Day 2 as “the bully” for further
analysis

Authoritativeness and Shift
• Significant difference in
Authoritativeness of Bullies
and Non-Bullies in Control
condition on Day 2
– F(1,23) = 5.92, p < .05**
• Visible Shift only in Control
Condition
– F(1,23) = 5.28, p < .05**, effect size .15σ
– Bullies in Control condition shifted to
more authoritative stance
– Non-bullies in Control condition shifted
to less authoritative stance
51
AuthoritativenessShift

Learning
• No significant main effect of Aggressive
behavior on learning
– Bullied students in Control condition learned
significantly less than Social Condition students
– Recall that students respond differently to help in
Control condition
• Significant interaction between Shift and
Condition on Learning: F(1,20) = 7.91, p =
.01**
– Opposite trend in Social Condition
– Significant correlation between amount of shift and
learning only within Control condition
• Shifting down was associated with less
learning
BullyingShift

54
What factors lead to dropout along the way?

55
Measuring Cognitive Engagement
• Displays effort in interpreting,
reflecting on, and reasoning
about course material
• Used a publically available
Abstractness dictionary
(Turney et al., 2011)

56
Measuring Motivation
• 514 Posts labeled by Mechanical
Turk, on a 1-7 likert scale
(Extremely Unmotivated …
Extremely Motivated)
• Each example rated by 6 Mturkers
(Intraclass correlation = .74)
• Classifier binary classifier (median
split) trained using LibLinear with
L2 regularization (72.3% accurate)

57
Association Between Cognitive Engagement,
Motivation and Commitment

Confusion Detection
Goal - Build models to
automatically identify the level of
confusion expressed in students’
posts
1. Create the Human-Coded Dataset:
MTurk
2. Feature Space Design: click
behavior, domain-specific content
words, linguistic features
3. Train classifier over annotated data
4. Apply the classifier to all posts in a
course

Human-Coded Dataset: MTurk
• Amazon Mechanical Turk Labeling
– For each post ($0.06), judge the level of confusion
contained in the message in a 1-4 Likert scale
– Ranging from ‘No Confusion’, ‘Slightly Confused’,
‘Moderately Confused’ and ‘Seriously Confused’
• Intra-Class Correlation
– 0.745 (Algebra), 0.801 (Microeconomics)
• Agreement between MTurkers’ average ratings
and expert’s labeling
– 0.86 (Algebra), 0.80 (Micro)

Feature Space Design
• Click patterns reveal sequences of activities associated with confusion
– Patterns consist of taking quizzes (quiz), watching lectures (lecture),participating in
forums (forum), and viewing other course materials (course)
• Language Features from Forum posts
• Linguistic Indicators
– Pronouns : “I, we, you, she/he”;
– Sentiment : negative affect
– Grammar: negation, disfluencies, adverbs
• Question Indicators
– Question Markers ‘?’
– Whether sentences begin with confusion related expression
– Whether sentences start with a modal verb/question word

Mitigating Effect of Social Support

Expressed vs Exposed Confusion

66
▪ Chat consistently reduces attrition
Bazaar Collaborative Chat
Tested Successfully in Multiple MOOCs

● Reflection activities offered as
optional supplements at the
end of each unit
● If students click to enter the
activity, they may be required
to do it individually or
collaboratively
○ Out of 14,000 clicks to
enter the reflection
activities, 25% included an
additional student
Survival Analysis from Medicinal Chemistry MOOC

Technology Support for Collaborative
Learning
Automatic
Analysis
Of
Conversation

LightSIDE Tool Bench
http://ankara.lti.cs.cmu.edu/side/download.html

Machine Learning is Like the Fiber Arts

77
Machine Learning is not just Algorithms!
Data Collection Anonymization Sampling Cleaning
ReformattingFeature Space
Representation
Feature SelectionModeling

Data Representation is the Key

Nguyen, D., Dogruöz, A. S., Rosé, C. P., de Jong, F.
(2016). Computational Sociolinguistics: An Emerging
Area for Language Technologies, Computational
Linguistics, Vol. 42, No. 3: 537–593.
80

Resources
DANCE Discussion Forum is
compatible with Open edX
Includes hooks for
interventions like Social
Recommendation and
Discussion Scaffolding

Resources
LightSIDE
Text mining tool bench
Over 10,000 users have downloaded LightSIDE
Automated collaborative process analysis
Automated writing assessment/feedback
generation
Social Recommendation
deployed so far in one MOOC to support help
exchange

Conclusion
• Join us: Open Source Resources
– Let me know if you would like to collaborate

Leveraging Entailment as
a Pretraining Task
Entailment
Transactivit
y

Entailment: Wikipedia Definition
• In semantics, entailments depend on the
"dictionary definition" of the words in question.
• To judge whether an entailment is true, one
can ask, "Could it ever be the case that B isn't
true while A is true?"

Entailment: Example
• Example from M. Lynne
Murphy's Lexical Meaning
• "If it is a shoe, then it is
made to be worn on a foot."

Entailment Dataset
• Stanford Natural Language Inference Corpus, Bowman et al. 2015.
• Collection of 570,000 English sentence pairs labeled for balanced
classification of entailment, contradiction, and neutral.
• Examples were generated by humans in response to sentences
describing pictures from Flickr
• Example:
– Sentence1: “A soccer game with multiple males playing.”
– Sentence2: “Some men are playing a sport.”

Deep Learning for Transactivity Detection (Parikh
et al. 2016)

Step 1: Attend
• For each pair of words in the two posts, determine some
attention score via 2 layer dense feed-forward neural
network.
• For each word in each post, average all the attention scores
with relation to the other post.
• What you get:
– Information that indicates how important each word in a given
post is with respect to the other post.

Step 2: Compare
• Using the representation from the attention step along
with the corresponding vectorized input post, run
though a 2 layer dense feedforward neural network.
• What you get:
– Two sets of vectors for that contain information comparing
the posts with respect to each other.

Step 3: Aggregate
• Sum each set of comparison vectors into two
one dimensional vectors.
• Each of these vectors is a representation of a
given post in relation to the other.

Step 4: Classify
• With the resulting vectors from the aggregation
step, we concatenate them and run them
through another 2 layer dense feedforward
neural network with cross-entropy loss to
classify the data.

Experiment 1
1. Train Entailment task first
2. Use trained weights as initialization for
Transactivity task
3. Train on Transactivity task

Transactivity Dataset
• Discussion data from online forum where students offered feedback to one another
on their proposals for city power plans
• 476 human annotated posts.
• Example:
– Sentence 1:
“But if the energy is saving them some money it could go towards the batteries. Whats
frustrating is that it doesn't really give us information regarding the costs of generating
electricity currently.”
– Sentence 2:
“But those batteries add even more cost, and for a city concerned with cost, that would be a
problem. Plus, without the batteries, it's not very reliable, and that's also a problem for a
touristry driven economy.”

Results, Part 1
Model Accuracy Cohen’s Kappa
Logistic Regression with unigrams 0.795 0.510
Logistic Regression with
embeddings
0.626 0.182
Our model 0.848 0.542

Experiment 2
• Transactivity prediction with in domain data vs. out of
domain data
• Train the model as in experiment 1, however on each cross
validation fold, evaluate the model on out of domain
annotated transactivity data.
• Note that there is no point in which the model is trained on
the out of domain data.

Out of Domain Transactivity Dataset
• 57 human annotated transacts from an
Massive Open Online Course (MOOC) in which
students were asked to design their own
superheroes and provide feedback on other
students’ designs.

Results, Part 2
Model Accuracy (in |
out)
Cohen’s Kappa (in
| out)
unigrams
0.795 | 0.667 0.510 | 0.376
embeddings
0.626 | 0.635 0.182 | 0.195
Our model 0.848 | 0.824 0.542 | 0.586

Machine Learning for Negotiation
Results given are from 20-fold
leave-one-conversation-out cross validation
All improvements between models are
significant (p < .01)
Tools used:
•LightSIDE (Mayfield and Rosé, 2010) for feature extraction
•SVMlight
(Joachims, 1999) for machine learning
•Learning-Based Java (Rizzolo and Roth, 2010) for ILP inference

SFL Researchers have found that,
in general… (Martin and Rose, 2003)
1. You don’t request information or action after
it’s been given.
2. Knowledge and action don’t mix.
3. You don’t respond to the same request
twice.
4. You don’t respond to your own requests.

Integer Linear Programming allows
us to require our classifier to fit to
these patterns (quickly)!
When a prediction would break a
rule, force it to start a new sequence
or back off to a less likely label.

Baseline with ILP Constraints Added
0.62
0.49
0.66

Improvements to our feature space
• Lexical and part-of-speech bigrams
• Cosine similarity to previous line
• Predicted label of previous line
• Separate segmentation models for short
(<4 words) and long lines
• Segmentation features for speaker shift

Added Features with ILP Constraints
0.58
0.68
0.95

From Data to Design of Dynamic Support

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Viewers also liked

Viewers also liked (18)

Similar to From Data to Design of Dynamic Support

Similar to From Data to Design of Dynamic Support (20)

Recently uploaded

Recently uploaded (20)

From Data to Design of Dynamic Support