Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Mental Disorder Detection on Twitter
1. Mental Disorder Detection on Twitter:
Bipolar Disorder and Borderline
Personality Disorder
National Tsing Hua University
Department of Information System and
Application
Advisor: Prof. Yi-Shin Chen
Student: Chun-Hao Chang
2. Introduction
18.1% people suffer from mental disorder in United States (*)
Using Social Network to research on Mental Disorder
National Insititute of Mental Helath:
http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml
Analyze
4. Related Works
Quantifying Mental Health Signals in Twitter - John Hopkins
University(Coppersmith, G., Dredze, M., & Harman, C. (2014))
Automatic collecting patients by matching: “I was diagnosed with X” in tweets
Prediction of 4 kinds of disorder
Predicting Depression via Social Media - Microsoft (M De Choudhury, M
Gamon, S Counts, E Horvitz - ICWSM, 2013)
Collecting data from Amazon Turks and purchased Twitter data.
Able to predict an user having depression disorder before a formal diagnosis
Two important related works are as following:
5. Related Works
Quantifying Mental Health Signals in Twitter - John Hopkins
University(Coppersmith, G., Dredze, M., & Harman, C. (2014))
Automatic collecting patients by matching: “I was diagnosed with X” in tweets
Prediction of 4 kinds of disorder
Predicting Depression via Social Media - Microsoft (M De Choudhury, M
Gamon, S Counts, E Horvitz - ICWSM, 2013)
Collecting data from Amazon Turks and purchased Twitter data.
Able to predict an user having depression disorder before a formal diagnosis
Two important related works are as following:
700 Million Tweets from Oct 2014 to Aug 2015
Only 8 Bipolar and 3 BPD patients are found
6. Background
Bipolar Disorder:
*Unstable and impulsive emotions
Cycling between Maniac and Depression episodes
Borderline Personality Disorder:
*Unstable and impulsive emotions
Impaired social interactions
8. Data Collecting
A community portals is
a Twitter account which
is followed by a lot of
patients.
The community portals can
be found by searching for
the disorder on the Twitter
website.
Keywords
Filter
Manual
Verification
Collect
Followers
(REST API)
Randomly
Sample Users
(Streaming API &
REST API)
Manually Collect
Community
Portals
Tweets of Patients ,
Experts and Random
Samples
Collect
Tweets
(REST API)
1
9.
10. Data Collecting
Download Followers of
Community Patients.
(5000 followers for each portal in this study)
Filter out suspicious patients
from follower profiles by
keywords:
BPD and Bipolar in this study
Manually label the users as
patients, experts and
non-related.
Keywords
Filter
Manual
Verification
Collect
Followers
(REST API)
Randomly
Sample Users
(Streaming API &
REST API)
Manually Collect
Community
Portals
Tweets of Patients ,
Experts and Random
Samples
Collect
Tweets
(REST API)
2
3
4
13. Data Collecting Download Tweets by REST API
(3200 tweets at most, exclude retweets)
1. Randomly sample English
spoken users by Twitter
Streaming API
2. Download Tweets by REST
API
(3200 tweets at most, exclude retweets)
Keywords
Filter
Manual
Verification
Collect
Followers
(REST API)
Randomly
Sample Users
(Streaming API &
REST API)
Manually Collect
Community
Portals
Tweets of Patients ,
Experts and Random
Samples
Collect
Tweets
(REST API)
6
5
14. Data Collected Group Users
Random Samples 823
Bipolar 798
BPD 427
Bipolar Experts 54
BPD Experts 42
We assume theses random sampled Twitter users and experts does not
have Bipolar or BPD
Because prevalence of Bipolar is 2.6% and BPD is 1.6% (*) in United States. It
shouldn’t seriously damage the predictive performance
National Insititute of Mental Helath:
http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml
16. Data after preprocessing
Group Users Tweets Averaged
Tweets
Random
Samples
548 796957 1454.3
Bipolar
Patients
278 347774 1250.99
BPD Patients 203 225774 1112.19
Bipolar Experts 11 14056 1611.67
BPD Experts 9 19696 1790.55
17. Feature Extraction
1
TF-IDF
Features
LIWC Features
Pattern of
Life
Features
TF-IDF
Calculation
LIWC
Counting
Polarity
Extraction
Emotions
Extraction
Age Gender
Prediction
Social
Behavior
Extraction
Open vocabulary approach by
calculating unigram and bigram
Personal behaviors: Emotional Pattern,
Social Interactions and Profiles Data
64 Categories of Psychological
Dictionary
18. Feature Extraction :
Pattern of Life Features
Proposed by Coppersmith et al. We further improve it as following :
1. Polarity: Positive and negative percentages, Positive
and negative combos ratio, Flips ratios
2. Emotions: Percentage of eight emotions
3. Age and Gender: Inferred age and gender(*)
4. Social Interactions: Mentioning Rate, Frequent menting
Counts, Unique Mentioning Counts
Schwartz, H. Andrew, et al. "Personality, gender, and age in the
language of social media: The open-vocabulary approach." PloS one
8.9 (2013): e73791.
APA
19. Feature Extraction: Illustration of combos and Flips
3 min 900 min 18 min 15 min 800 min 200 min
1 Flips 3 Negative Combos
Flip Ratio = 1 / 7
Negative Combo Ratio = 3 / 7
Time interval between tweets
Flip Time threshold: 30 min
Combo Time threshold: 120 min
20. Classifiers Training and Evaluations
TF-IDF Models
Pattern of
Life Models
LIWC Models
Random Forest
Classifier Training
10-Fold Cross
Validation Test
Selection Bias
Test
Limited Data
Test
21. Classifiers Training and Evaluations
TF-IDF Models
Pattern of
Life Models
LIWC Models
Random Forest
Classifier Training
10-Fold Cross
Validation Test
Limited Data
Test
Selection Bias
Test
Shows relationship between
precision and recall
Randomly split data into 10 chunks, 9
chunks for training and 1 chunks for
testing. And calculate the precision and
recall after multiple iteration
22. Evaluations on Bipolar: 10-fold Cross Validation
Area Under the Curve:
Pattern of
Life
0.90
LIWC 0.91
TF-IDF 0.96
23. Evaluations on BPD: 10-fold Cross Validation
Area Under the Curve:
Pattern of
Life
0.91
LIWC 0.90
TF-IDF 0.96
24. Classifiers Training and Evaluations
TF-IDF Models
Pattern of
Life Models
LIWC Models
Random Forest
Classifier Training
10-Fold Cross
Validation Test
Selection Bias
Test
Selection Bias
Test
To see if model is predicting
people having disorder or just
talking about it
11 Bipolar experts 9 BPD experts as
the testing data. It shows the
tendency of classifiers mis-classified
experts as patients
26. TOP 10 Keywords from TF-IDF Classifier
Bipolar BPD
mentalhealth dbt
meds feeling
blog borderline
therapy helps
anxiety self harm
thoughts psychiatrist
feel better cpn
electroboyusa disorder
health bpdchat
bipolarblogger depression
TF-IDF Classifier has the tendency to
detect
people who are talking about disorder
27. Classifiers Training and Evaluations
TF-IDF Models
Pattern of
Life Models
LIWC Models
Random Forest
Classifier Training
10-Fold Cross
Validation Test
Limited Data
Test
Selection Bias
Test
Reveals how precision changes
when the tweets are limited.
Similar to 10-fold cross validation,
but the testing data are extracted
only from the latest K tweets
30. Conclusion:
How to efficiently collect the tweets data patients?
We proposed an efficient and accessible way to collect
tweets of patients
How to correctly detect mental disorder patients?
We suggested that Pattern of Life Model gives high
precision and low bias