Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze Detection: A Statistical Head Pose Analysis Approach

Enhancing Psychotherapy
Treatment by Analyzing Alliance
Ruptures through Gaze
Detection: A Statistical Head
Pose Analysis Approach
By Muhammad Zbeedat
Master’s student – University of Haifa
Supervised by: Prof. Ilan Shimshuni and Prof.
Hagit Hel-Or

Meet our
team
Ruptures Detection
Muhammad
Zbeedat
Master’s student and
member of the
Computational Human
Behavior Lab
Ilan Shimshoni
Professor in the department
of Information Systems and
formerly served as chair of
the department.
Hagit Hel-Or
Faculty member in the
dept of Computer Science
and head of the
Computational Human
Behavior Lab.
Sigal Zlicha-
Mano
A licensed clinical psychologist
and a full professor of clinical
psychology in the department
of psychology, and the
manager of the psychotherapy
lab.
Tohar Dolev-Amit
Doctoral graduate in clinical
psychology, a researcher and
lab manager in Prof. Sigal
Zilcha-Mano’s psychotherapy
lab.
Tal BenDavid -
Sela
Doctoral candidate in clinical
psychology, a researcher in
Prof. Sigal Zilcha-Mano’s
psychotherapy lab. 3

Introduction
• In psychotherapy, an alliance rupture is a
deterioration in the alliance, manifested by a
lack of collaboration between patient and
therapist on tasks or goals, or a strain in the
emotional bond.
• Ruptures have been identified in 91%–100% of
sessions. Ruptures have the potential to either
undermine the treatment or enhance it.
Ruptures Detection 5

Introduction –
Contd.
• Ruptures may be categorized into two main
subtypes:
• In Withdrawal ruptures, patients either move away
from the therapist and the treatment or move toward
the therapist in a way that denies the patient’s own
experience.
• In Confrontational ruptures, patients move against
the therapist or the work of therapy. Confrontational
ruptures may include complaints about the therapist
or the treatment.

Research
Question
How to detect ruptures in a
psychotherapy therapeutic recorded
session using Human Action
Recognition techniques?
Identifying ruptures is a critical stage in
reaching resolution, a resolution process
enables the patient and therapist to renew
or strengthen their emotional bond, and to
begin or resume collaborating on the tasks
and goals of therapy.

Introduction –
Contd.
• The primary objective of this study is to monitor
and identify such ruptures throughout recorded
therapy sessions using three different cameras.
• To achieve this goal:
• Human Action Recognition techniques were
employed, with a specific emphasis on Gaze
Detection through statistical head pose analysis.
• Additionally, we utilized some Facial Action Units
features.
Facial Action Units (AUs) are a way to describe
human facial expressions.

Ruptures Resolution Process
Detection
Detect Withdrawal
or Confrontational
ruptures
Analysis
Reevaluate the
therapy session,
and focus on the
rupture sections
(segments)
Resolution
A resolution
process enables
the patient and
therapist to renew
or strengthen their
emotional bond
Strategy
Activating resolution
strategies, such as
changing the task,
or disclosing the
therapist’s internal
experience of the
rupture
Success
The goal is to get a
successful
treatment at the
end and minimize
failures

The Rupture Resolution Rating System (3RS)
An observational
system for coding
rupture markers and
resolution.
The coders received
six months of training
with an experienced
coder.
Each session was
coded by a pair of
coders, drawn from a
pool of 8
undergraduate
students in
psychology.
To examine rupture
occurrence, ruptures
were coded in 5-
minutes segments.
Identified ruptures
were coded as a
Confrontation (CF) or
Withdrawal (WD).
The coded 5-minute
segments were then
aggregated to
achieve one overall
score for ruptures per
patient per session.
The Rupture Resolution Rating System (3RS), is the gold standard observer manual for detecting
ruptures.
It was applied to our recorded sessions.

3RS - Rupture markers
Withdrawal rupture markers:
Denial – denial
MinResponse – minimal response
AbstrComm – abstract communication
AvStoryShiftT – avoidant storytelling or shifting
topic
Deferential – deferential or appeasing
ContAffectSplit – content affect split
Selfcrithopeless – self-criticism or hopelessness
Confrontational rupture markers:
ComplTherapist – complains therapist
Rejectform – reject formulation
ComplActivity – complains activities
ComplParameter – complains
parameters
ComplProgress – complains progress
Ptdefendsself – patient defends self
Controlpressure – control or pressure
Coders analyzed each segment, looked for certain markers, and gave each marker a
value between 0 and 3 (0-no sign, 1-low intensity, 2-high intensity, 3-very high intensity).

Ground Truth Manual Coding Labels
• WD/CF - the overall mean of withdrawal/confrontational rupture markers
• WD2/CF2 - the count of withdrawal/confrontational rupture high-intensity
markers (2 and above)
• WD_binaryhigh/CF_binaryhigh - coded as:
• 0- no rupture or low intensity (WD2/CF2 = 0, all markers got 0 or 1)
• 1- high or very high rupture intensity (WD2/CF2 > 0, at least one marker
got 2 or 3)
• WD_binarylow/CF_binarylow - coded as:
• 0- no rupture (WD/CF ~ 0)
• 1- low, high or very high rupture intensity (WD/CF > 0, at least one marker
got 1 or above)

Challenges
Sensitive data
The analysis of
recorded sessions
was conducted within
the confines of the
psychotherapy labs
of the University of
Haifa
Un-balanced
data
Most of the segments
didn’t include ruptures.
Handled by creating
the new binarylow
labels, and using
SMOTE for
oversampling.
Experiment
setup
How accurate are the
features for Action
Units, Gaze, and
Head Pose, based
on cameras location
in the clinics?
Lack of
dominant
features
We extracted features
only from
images/frames, but
other features like voice
from the video, or the
text can help to reach
better performance.
Noisy data
Extracted features
(are noisy (by their
nature). Handled by
averaging features
across smaller units
in each segment, in
order not to be
flattened.

Method
14
Experiment
setup
Three cameras were utilized to record sessions: one focused on the
therapist, another on the patient, and a third captured both
individuals.
Cameras were positioned at a distance from the faces of the patient
and therapist, and they were not directly facing them. This setup
posed some challenges.
Ruptures Detection
Note: the therapist and patient in the scene are psychotherapy lab actors

Method
15
Participants
96 patients between the ages of 18–60, with major depressive
disorder, from the pilot and the main trial phases of a
Randomized Controlled Trial (RCT), participated in the this
study.
The whole therapy series for each patient (about 16 sessions)
was videotaped, but only three sessions (2, 4 and 8) were
codded manually using the 3RS for ruptures by coders.
Ruptures Detection

Method
Features
extraction
For the extraction of features from the recorded
therapy sessions, the study employed a Computer
Vision open-source tool named OpenFace (but any
other tool can be used). This tool offers an array of
capabilities:
• Head position assessment
• Facial Action Units detection
• Eye tracking and facial landmark detection, among
others.
16
Ruptures Detection

Calibrated Mode
3D coordinates of the eyes and
face were used for computing the
gaze vector originating from the
patient’s eyes toward the
therapist’s face. The objective was
to identify a direct gaze if the
vector alignment was sufficiently
close. A lack of such alignment
could indicate a potential rupture.
Method
Uncalibrated Mode
Statistical analysis of head pose was adopted to determine direct
gaze, replacing the previous geometric calculations.
17
Ruptures Detection

Method
Facial Action Units features
Facial Action Units (AUs) are a way
to describe human facial
expressions.
OpenFace can detect the intensity
(in a scale from 0 to 5) of 17 AUs.
Head pose features
By detecting head pose we can detect
if the patient is looking straight to the
therapist or not.
In our approach, Yaw and Pitch values
were used for statistical analysis of the
head rotation.
18
Ruptures Detection
Facial Action Coding System (FACS) - Guide

Method
21
Ruptures Detection
• Yaw/Pitch values were used to represent head pose of the patient and set a range of
straight gaze towards the therapist based on Yaw/Pitch probabilities range that
corresponds to the highest probabilities.
(KDE-Kernel Density Estimation was used to smooth the probability density
estimation)
• Max probability cut threshold was used in determining the ranges of Yaw and
Pitch. The cut threshold that was chosen is 20%.
• This range, however, exhibited variations across patients and sessions due to the
diverse profiles of individuals.
How to set the Direct Gaze range based on Head
Pose?

Method
22
Ruptures Detection
Novel approach
• Sub-division of segments into smaller units
(subsegments), spanning over 30-60 seconds.
• This approach facilitated the identification of precise
rupture instances within segments, while also
preventing feature flattening that could occur when
calculating and averaging derived feature values over
an entire segment.

Method - Subsegment size 60
sec
24
Ruptures Detection
For example, when
analyzing 60 seconds
subsegments we
identified a
problematic one:
that subsegment was
marked with a low
direct gaze, the red
points are the frames
inside the ranges of
yaw/pitch for that
session within that
subsegment.

Method - Subsegment size
30 sec
25
Ruptures Detection
But when
working with
30 sec
subsegments
and breaking it
into two, we
identified that:
The first one
was abnormal
(only 3%), but
the second
had a direct
gaze (81%).

Method
27
Segment1 S2 S3 S4 S5 S6 S7 S8 S9 S10
1. Over the course of the whole
session, calculate
mean/std/sem of Yaw/Pitch
values.
Unit1 U2 U3 U4 U5 2. Split each segment into smaller subsegments of
30 or 6o seconds and for each subsegment
calculate:
• Mean/Std/Sem of AUs and Yaw/Pitch values.
• Distances between yaw/pitch means for that
subsegment and the whole session.
• Ratios between yaw/pitch std/sem of that
subsegment and the entire session.
• Distribution similarity (Z-test) between that
subsegment and the entire session.
• Straight gaze percentage of patient looking
towards the therapist.
3. For each Segment, go through all its smaller subsegments and find the
following features:
• Action Units
- maximum & minimum of subsegments mean value inside a segment.
- mean of STDs/SEMs for all subsegments inside a segment.
• Yaw/Pitch
- max & min of Yaw/Pitch mean distance of all subsegments.
- mean of Yaw/Pitch std/sem ratios of all subsegments.
- max & min of Yaw/Pitch Z test of all subsegments.
• Straight gaze percentage
Number of straight gaze subsegments above a certain percentage (60%,
70%, 80%).
Ruptures Detection
Algorithm

Method - Features Analysis
28
Ruptures Detection
In segment 3, a substantial decrease in the number of units with straight gaze
was observed for all three threshold levels. Coincidentally, this same segment
was coded with a confrontational rupture of 57% intensity and a 30%
withdrawal rupture intensity.
In segment 3, there was a significant increase in Yaw Z-test
measures (min/max) which indicates a high difference between this
segment behavior and the entire session related to the patient
moving his head left/right.
In segment 3, there was a significant increase in Yaw means
distance measures (min/max) which indicates a high difference
between this segment behavior and the entire session related to
the patient moving his head left/right.

Machine Learning model
30
Ruptures Detection
• Classification machine learning models were
trained and tested within the
WD_binaryhigh/CF_binaryhigh and
WD_binarylow/CF_binarylow ground truth labels.
• To ensure the reliability of our results, we took great
care to ensure that sessions involving the same
patient were exclusively included either in the
training or testing phase.
• Grid Search with Cross Validation was implemented
to determine the optimal hyperparameters for each
RandomForestClassifier associated with every
ground truth label.

Withdrawal ML model - Features Importance
32
Ruptures Detection
AU06 AU07 AU12

Confrontation ML model - Features
Importance
34
Ruptures Detection
AU01 AU04 AU14
Facial Action Coding System (FACS) -
Guide

Results - WD_binaryhigh
36
Ruptures Detection
RandomForestClassifier for WD_binaryhigh with
SMOTE got a lower Test score, but it’s more
balanced (Test True positive/negative).

Results - WD_binarylow
38
Ruptures Detection
Oversampling in this case was redundant
since the data was balanced at the first place.

Results - CF_binaryhigh
40
Ruptures Detection
Since we have very low count of
confrontation ruptures over our data-set, the
results without oversampling are misleading!

Results - CF_binarylow
42
Ruptures Detection
The Test score without oversampling is higher, but
the true positive is very low. When using SMOTE
the Test score was reduced a bit, but the True
positive/negative are more balanced.

Results
43
• For Withdrawal ruptures, we recommend using the ML model of WD_binaryhigh
ground truth label with SMOTE. ML accuracy reached 65% with balanced true
positive/negative predictions.
• For Confrontational ruptures, we can’t use the CF_binaryhigh ground truth label,
due to the fact that data is extremely unbalanced for this label, Confrontational
ruptures were rare, and ML model was always classifying segments as non
rupture ones.
Instead, we recommend using the ML model for the CF_binarylow ground truth
label with SMOTE. The accuracy of this model reaches approximately 60%
and the true positive/negative predictions are balanced.
Ruptures Detection
Summary

Future Work
This study primarily revolves around images (frames)
extracted from recorded therapy sessions. But other
features can be explored:
• Voice analysis. The main challenge is distinguishing
whether the speaker was the therapist or the patient.
• Speech-to-Text tools. The main challenge was the
concern of data privacy for such tools that convert
speech to text (especially the online ones).
• Voice Emotion Detection techniques. Emotion
Detection (ED) stands as a subset of sentiment
analysis, focused on extracting and analyzing
emotions from text.

Summary and Conclusions
In the psychotherapy domain, the
machine learning model’s accuracy was
deemed acceptable. By integrating
additional features like voice analysis
and text mining of speech-generated
text, the accuracy could be further
enhanced.

Thank you
Muhammad Zbeedat
zbeedatm@gmail.com

Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze Detection: A Statistical Head Pose Analysis Approach

Recommended

Recommended

More Related Content

Similar to Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze Detection: A Statistical Head Pose Analysis Approach

Similar to Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze Detection: A Statistical Head Pose Analysis Approach (20)

Recently uploaded

Recently uploaded (20)

Enhancing Psychotherapy Treatment by Analyzing Alliance Ruptures through Gaze Detection: A Statistical Head Pose Analysis Approach

Editor's Notes