Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media

Subconscious Crowdsourcing:
A Feasible Data Collection Mechanism for Mental
Disorder Detection on Social Media
Chun-Hao Chang, Elvis Saravia, and Yi-Shin Chen
Institute of Information Systems and Applications
National Tsing Hua University
Hsinchu, Taiwan 30013, R.O.C.
Email: { ccha97u, ellfae, yishin}@gmail.com
1

Introduction
➔ One in three persons report sufficient criteria for at least one form of
mental disorder at some point in their life.
➔ 16% in US suffer from some form of mental disorder. The leading cause
of disability worldwide.
➔ Problem: Majority of cases remain largely undetected. Diagnosis is
difficult.
➔ Solution: Social networks provide a venue for mental disorder research.
Source: Wikipedia 2

Background
Bipolar Disorder:
- Unstable and impulsive emotions
- Cycling between mania and depression
Borderline Personality Disorder:
- Unstable and impulsive emotions
- Impaired social interactions
3

Motivation
➔ Open access to patients data
from social websites.
➔ Build a real-time mental health
assessment tool to assist in
diagnosis.
4

Related Work
➔ Predicting Depression via Social Media - Microsoft (M De Choudhury, M
Gamon, S Counts, E Horvitz - ICWSM, 2013)
1. Collected data using crowdsourcing platform, Amazon Mechanical Turk.
2. Purchased Twitter data.
3. Prediction of depression before diagnosis.
➔ Quantifying Mental Health Signals in Twitter - John Hopkins University
(Coppersmith, G., Dredze, M., & Harman, C. (2014))
1. Automatically collected patients by keyword matching (e.g., “I was diagnosed with X”).
2. Predicts 4 different kinds of mental disorders.
Limitation: Data not easily accessible or reproduced.
5

Challenges
➔ How to identify online patients?
➔ How to efficiently collect patients data?
➔ Avoid selection bias - Is the predictive model detecting patients with
mental illnesses or just people talk about it?
6

Objectives
➔ To build predictive models for the purpose of mental disorder
detection.
➔ To extract features which alleviate the selection bias problem.
➔ Standardize features for mental disorder detection.
7

Data Collection
➔ Subconscious crowdsourcing - a reliable and efficient mechanism to
gather patients data. Community is the key element.
Therapist
Patients
9

Preprocessing
➔ Twitter accounts with more than 100 posts
➔ Accounts with more than 50% hyperlinks were also removed
Purpose: Getting rid of spam accounts.
10

Feature Extraction
➔ Overall, we are interested in linguistic and behavioural
features.
➔ Information that reveals a user’s personality and behavior: emotion
transition, social interactions, age, gender, etc.
➔ TF-IDF, LIWC, and Pattern of Life Features
11

Features
➔ TF-IDF Model:
◆ Unigrams and bigrams
➔ LIWC (Linguistic Inquiry and Word Count):
◆ Thoughts, feeling, personality and motivation
➔ Pattern of Life:
◆ Emotional scores, age, and gender
◆ Polarity features (negative ratio, positive ratio, positive combo,
negative combo, and flips ratio)
◆ Social features (tweeting frequency, mention ratio, frequent
mentions, and unique mentions) 12

Experiments: Data
Group Users Tweets Averaged Tweets
Random Samples 548 796957 1454.3
Bipolar Patients 278 347774 1250.99
BPD Patients 203 225774 1112.19
Bipolar Experts 11 14056 1611.67
BPD Experts 9 19696 1790.55
13

Experiments: Evaluation
➔ Three predictive models (Random Forest) for each mental disorder
◆ Pattern of Life Model
◆ TF-IDF Model
◆ LIWC Model
➔ Three experiments
◆ 10-Fold Cross Validation Test
◆ Selection Bias Test
◆ Limited Data Test
14

10-Fold Cross Validation
Pattern of Life 0.90
LIWC 0.91
TF-IDF 0.96
Pattern of Life 0.91
LIWC 0.90
TF-IDF 0.96
15

Selection Bias Test
Is model detecting user suffering from
mental disorder or just talking about it?
Bipolar BPD
mentalhealth dbt
meds feeling
blog borderline
therapy helps
anxiety self harm
thoughts psychiatrist
feel better cpn
electroboyusa disorder
health bpdchat
bipolarblogger depression
Top TF-IDF terms
16

Data Limitation
What if user only has a few tweets?
17

Conclusion
➔ We proposed an efficient and accessible mechanism for collection
patients data.
➔ We improved the Pattern of Life Model to produce better predictions.
➔ Address selection bias problem, previously not addressed.
Future work: Support more mental illnesses
18

Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media

Recommended

Recommended

More Related Content

Similar to Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media

Similar to Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media (20)

More from Elvis Saravia

More from Elvis Saravia (9)

Recently uploaded

Recently uploaded (15)

Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media