Slides from our presentation in ASONAM 2017 presenting our work on Depression Detection on Social Media. The full paper can be found in our library: http://knoesis.org/node/2867
SQL Database Design For Developers at php[tek] 2024
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
1. Modeling Social Behavior for Health care Utilization in
Depression
Semi-Supervised Approach to Monitoring Clinical Depressive
Symptoms in Social Media
Amir Hossein Yazdavar∗, Hussein S. Al-Olimat∗, Monireh Ebrahimi∗, Goonmeet Bajaj∗,
Tanvi Banerjee∗, Krishnaprasad Thirunarayan∗, Jyotishman Pathak∗∗ and Amit Sheth∗
∗ Kno.e.sis Research Center, Wright State University, OH, USA
∗∗ Division of Health Informatics, Cornell University, New York, NY, USA
Project Funded by:
NIH R01 Grant #:
MH105384-01A1
The content is solely the responsibility
of the authors and does not necessarily
represent the official views of the
National Institutes of Health.
rebrand.ly/depressionProject
@knoesis_mdd
2. The importance of Studying Clinical Depression
November , 2011,
“Teen Tweets Before Committing Suicide: The
Importance of “Cyber-helping.”
http://www.adweek.com/digital/teen-tweets-before-comitting-suicide-the-importance-of-cyber-helping/
September , 2015,
“Jim Carrey’s Girlfriend: Her Last Tweet Before
Committing Suicide ’Signing Off.”
http://hollywoodlife.com/2015/09/29/jim-carrey-girlfriend-suicide-note-twitter-break-up/
Clinical depression is one of the most common
mental illness
Adults in USA age 18 and older
suffered from depression
Of those suffering receive
treatment.
Spent on depression treatment in a
year.
People affected
Over 90% of people who commit suicide have
been diagnosed with clinical depression.
1/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
350
million
$42
billion
40
million
Only 1/3
3. The importance of Studying Clinical Depression
In 2015, there were more than 44K reported suicide deaths.
“Suicide claims more lives than war, murder, and
natural disasters combined.”
American Foundation for Suicide Prevention:
Every 12 minutes a person dies by suicide in the US.
Every day, 121 Americans take their own life.
500K people visited a hospital for injuries due to self-harm.
Suicide was the second leading cause of death for adults between the ages of 10 & 34 years
2/30@knoesis_mdd
http://myefiko.com/wp-content/uploads/2016/05/girls-suicide.jpg
http://www.udayavani.com/sites/default/files/imag
es/english_articles/2017/04/15/Suicide.jpg
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
4. Previous efforts to address Clinical Depression
Global effort to manage depression involves detecting it through
survey-based methods via phone or online questionnaires
Under-representation
Sampling bias
Incomplete
information
Large temporal gaps between
data collection & dissemination of findings
Cognitive bias
3/30@knoesis_mdd
http://www.thechicagobridge.org/wp-
content/uploads/2015/12/online-survey.jpg
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
5. Unique opportunity to study Clinical Depression
Social Media Platforms:
A valuable resource for learning about users’ feelings, emotions,
behaviors, and decisions that reflect their mental health as they
are experiencing the ups and downs.
How well can textual content in social media be harnessed to
reliably capture clinical depression symptoms of a user over time?
Are there any underlying common themes among depressed users?
https://makeawebsitehub.com/wp-content/uploads/2016/04/social_media.jpg
http://www.wamitab.org.uk/useruploads/images/istock_question_man_small.jpg
http://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_s
lideshow/public/images/2015/big-data-medicine-model_0.jpg?itok=9gUD6D48
4/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
6. Supervised approach
Language Style, Emotion, Ego-network, User Engagement
Low Recall,
High dependency
on the quality of
lexicon
Labor-intensive
annotations
5/30@knoesis_mdd
Lexicon-based approach
Previous Efforts to Study Depression
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
7. Previous Efforts to Study Depression
Simply use the keyword “depression” in tweets to find depression indicative tweets
2.bp.blogspot.com/_VMAt17gvKp8/TEUAD3ebxQI/
AAAAAAAAA1A/riVGzdSTavM/s1600/confuse.gif
“economic depression”,
“great depression”,
“depression era”,
“tropical depression”
Ambiguity
I am depressed, my
paper got rejected again
Transient sadness
...sleep forever...
Context sensitivity.
6/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
8. Can we develop more specific evaluations
rooted in current clinical protocols?
According to DSM*, clinical depression can be diagnosed through
the presence of a set of symptoms over a period of time.
Patient Health Questionnaire 9 items (PHQ-9)
A nine item depression scale used by clinicians to screen,
diagnose, and measure the severity of depression.
PHQ-9 rates the frequency of
symptoms which leads into
scoring severity index.
PHQ-9 is completed by the
patient and is scored by the
clinician.
http://cdn.xl.thumbs.canstockph
oto.com/canstock10070068.jpg
7/30@knoesis_mdd
* Diagnostic and Statistical Manual of Mental Disorders (DSM)
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
9. Top-down definition of depressive disorder
pinkgirlq8.com/wp-
content/uploads/twitte
r.jpg
8/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
10. Obsessed
with weight
S5
Lassitude
S4
Sleep
disorder
S3
@knoesis_mdd
so depressed with my weight
can I just want be skinny
So tired..always so tired. havent slept in 8 days..
94lbs, urgh I disgust myself. 0 energy to do anything
just burst out in tears because I'm
that tired :’( :’( why can't I sleep
PHQ-9 symptoms in tweets
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
9/30
11. Self harm
S9
Suicidal
Thought
S9
Feeling bad
about yourself
S6
10/30@knoesis_mdd
I've never been so sure about
suicide
Cut again tonight, couldnt do it as
bad as I have therepy tomorrow
I feel like a failure
I swear every night I tell myself
"one more cut and that's it" It's
never just one more.
I'm so done. **** recovery. **** life.
Just want to die. Couldn't care less
what happens to me anymore.
What a sick life I lead..
PHQ-9 symptoms in tweets
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
12. How to Study Clinical Depression in Twitter?
We built a model which emulates PHQ-9 questionnaire for
detecting clinical depression symptoms in Twitter profiles.
Analyzing user’s topic preferences and word usage we
can monitor the depression symptoms.
11/30@knoesis_mdd
What they are talking about?
How they express themselves?
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
13. A document is mixture of latent topics,
where a topic is a distribution of co-
occurring words.
Simply counts the number of depression-
indicative terms by creating a dictionary of
terms for each depressive symptoms.
Learned topics were not
granular and specific
enough to correspond to
depressive symptoms.
bocawatch.org/wp-
content/uploads/2016/10/careerconfusion
1-e1417093044460.png
12/30@knoesis_mdd
Ambiguity and context
sensitivity problems
How to Study Clinical Depression in Twitter?
Bottom-up processing: LDA Top-down processing: Lexicon-based
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
14. Hybrid Processing
We add supervision to LDA, by using terms that are strongly related to the 9
depression symptoms as seeds of the topical clusters, and guide the model to
aggregate semantically related terms into the same cluster.
13/30@knoesis_mdd
How to Study Clinical Depression in Twitter?
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
15. 14/30@knoesis_mdd
Generating Seed Terms
lexicon of 1620 depression-related symptoms categorized
into nine different clinical depression symptom categories
Lexicon
PHQ-9
Symptoms
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
16. 15/30@knoesis_mdd
Generating Seed Terms
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
Lack of Interest no_interest no_motivation no_desire acedia incuriosity
Feeling Down agony all_torn_up anguish atrabilious bad_day be_alone
Sleep Disorder awake bad_sleep be_up_late can't_fall_back_asleep
Lack of Energy did_nothing dog-tired don't_have_the_strength drained
Eating Disorder as_big_as_a_mountain as_fat_as_an_elephant avoirdupois binge
Low Self-esteem i_am_a_burden abhorrence am_useless atelophobia
Concentration Problems absent_minded absent-minded absentminded absorbed
Hyper/Lower Activity nervous obtuse overwrought panic panic_attack panic-stricken
Suicidal Thoughts benumb better_be_dead better_off_death bleed_to_death
17. Social media users have a creative descriptive metaphorical phrases and
explanations for symptoms
I’m so exhausted all timeSo tired, so drained, so done
Lack of Energy
16/30@knoesis_mdd
Generating Seed Terms: Challenges
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
18. Language of social media contains polysemous words in its vocabulary
Cut my finger opening a can of fruit scars don’t heal when you keep cutting
Word Sense Disambiguation (WSD)
We disambiguate a polysemous word based on the
sentiment polarity of its enclosing sentence.
17/30@knoesis_mdd
Generating Seed Terms: Challenges
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
19. We divide each user’s collection of preprocessed tweets into a set of tweet
buckets using a specific time interval of d days.
Φ the distribution of words per symptom.
θ the distribution of symptoms over buckets
We restrict a topic si to a single corresponding
value for each user-specific seed terms.
Each term wi is then assigned to the largest
probability symptom associated with it in Φ.
18/30@knoesis_mdd
Framework
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
Andrzejewski, D. and Zhu, X. (2009). Latent Dirichlet Allocation with Topic-
in-Set Knowledge. NAACL 2009 Workshop on Semi-supervised Learning for
NLP (NAACL-SSLNLP 2009)
20. 19/30@knoesis_mdd
Framework
User texts
Buckets of 14
days texts
Run Topic
Modeling
Seed Topics from Lexicon
Change Z-values in the
multinomial distribution
Θ distribution of Symptoms (topics) over Documents (buckets)
Φ distribution of words over symptoms (topics)
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
21. Quantitative Analysis
Umass:
where D(wi ,wj ) counts the number of
documents containing both wi and wj words
and D(wi) counts the ones containing wi
● Topic coherence measures score a single topic by identifying the degree of
semantic similarity between high-scoring words in that topic.
UCI:
where the word probabilities are calculated
by counting word co-occurrence in a sliding
window over an external dataset such as
Wikipedia.
20/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
22. ● We created a dataset containing 45,000 Twitter users who self-declared their depression and the
other 2000 “undeclared” users collected randomly.
● What is self-declared depression?
21/30@knoesis_mdd
Dataset
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
23. ● Data Preparation
○ After removing the profiles with less than 100 tweets,
○ we obtained 7,046 users with 21 million timestamped tweets (at most 3,200 per user)
○ Next, we randomly sampled self-reported 2,000 profiles and 2,000 random users.
● Performing text preprocessing
Our model discovers depressive symptoms as latent topics from sliding
window on buckets of timestamped tweets posted by users.
22/30@knoesis_mdd
Dataset
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
24. The following table illustrates the sample of topics learned by ssToT and LDA model. p(w|s)
23/30@knoesis_mdd
Qualitative Analysis
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
25. ● Interesting Observations:
○ Captures acronyms related to each symptoms
○ Has excessive usage of expressive interjections
○ Has family and friends-related topics
Symptom 5
“Mfp” ↣ for “More Food Please”
“Ugw” ↣ stands for “Ultimate Goal Weight”
Symptom 2
“idec” ↣ “I Don’t Even Care”
“aw” ↣ indicative of disappointment
“feh” ↣ indicative of feeling underwhelmed
“ew” ↣ denoting disgust
“argh” ↣ showing frustration
family, hugs, attention,
parents, competition, daddy,
mums, sigh, grandma,losing
24/30@knoesis_mdd
Qualitative Analysis
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
26. Visualizing Depressive Symptoms
● We keep topics containing certain number of seed terms as dominant words (threshold)
25/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
27. ● Average coherency of different models vs our model.
● The higher the coherency score, the more interpretable the topics are
26/30@knoesis_mdd
Quantitative Analysis
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
28. Symptom Prediction (Multi-label Classification)
● We try to predict the correct set of labels (depressive symptoms) for each
bucket of tweets.
● We build a ground truth dataset of 10400 tweets in 192 buckets. Each bucket
contains tweets that are posted by the user within span of 14 days (in
compliance with PHQ-9).
● Tweets are selected from a randomly sampled subset of both self-reported
depressed users and random users.
● Three human judges manually annotated each tweet using the nine PHQ-9
categories as labels.
27/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
29. Testing our method on Symptom Prediction (Multi-label Classification) vs. supervised binary relevance
(BR) and classifier chains (CC) methods.
28/30@knoesis_mdd
Our Semi-supervised approach vs. Supervised approaches
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
S1 S2 S3 S4 S5 S6 S7 S8 S9
F-SCORE
PHQ-9 SYMPTOM
BR-MNB CC-MNB BR-SVM CC-SVM ssToT
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
30. Limitations
Our model works well with less descriptive symptoms
Over thinking
always destroy
my mood
this essay is dragging
so much, can’t deal with
essays and revisions
any more :(
my head is
such a mess
right now
I need a
break from
my thoughts
Concentration
Problems
descriptive and metaphoric and do not contain any depressive-indicative term
photos.gograph.com/thumbs/
CSP/CSP993/k14300461.jpg
29/30@knoesis_mdd
Limitations
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
31. Conclusion and Future work
● We showed the potential of social media data for extracting depression symptoms.
● We developed a semi-supervised statistical model using a hybrid approach that
combines lexicon and topic modeling.
● Our approach complements the current questionnaire-driven diagnostic tools by
gleaning depression symptoms in a continuous and unobtrusive manner.
● Our model yields promising results that is competitive with a fully supervised approach:
70.6% on average F-Score for capturing the 9 depression symptoms.
● In future, we plan to use the system to detect depressive behavior at the community
level which also includes the study of demographics and so on.
30/30@knoesis_mdd Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media
32. Thank You!
For more information: rebrand.ly/depressionProject
@halolimat hussein@knoesis.org
@knoesis_mdd