Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mining auditory
hallucinations from
unsolicited Twitter

posts
M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G....
Mining auditory
hallucinations from
unsolicited Twitter

posts
schizophrenia
hearing voices
mental
psychosissymptom
sound
...
Mining auditory
hallucinations from
unsolicited Twitter

posts
social network
brief message
fewer than 140 characters
310M...
Mining auditory
hallucinations from
unsolicited Twitter

posts
knowledge discovery
exploratory
patternsunseen
data
analysi...
Mining auditory
hallucinations from
unsolicited Twitter

posts
schizophrenia
hearing voices
mental
psychosissymptom
sound
...
Research aim
Q: Is it feasible to generate useful datasets from
unsolicited Twitter posts regarding auditory
hallucinatory...
Research aim
Q: Is it feasible to generate useful datasets from
unsolicited Twitter posts regarding auditory
hallucinatory...
Potentially related posts
7
I am hearing a scary voice right now, I don’t know if
it’s in my head or in television.. Crazy...
Unrelated posts
8
My grandmom is watching Deliver Us From Evil and I
can hear this weird high-pitched voice and want
Ralph...
Iterative workflow
9
Define search queries
Collect unique posts from Twitter
Annotate posts &
Explore data
Predict relatedne...
Data collection
10
Search query
hallucinating hearing
(“hear things” OR “hearing things”) “in my head”
hearing scary thing...
Data annotation
11
• Two research psychologists manually
annotated posts:
• Assign classes: related or unrelated
to halluc...
Data exploration: semantic classes
12
• Relative (father, friend)
• Communication Tool (phone)
• Audio Device

(headphones...
Text classification pipeline
13
Im hearing a scary voice rn,idk if
it’s in my head or in TV..craazy
Information
Extraction
...
Information extraction
14
My grandmom is watching Deliver Us From Evil and
I can hear this weird high-pitched voice and wa...
Information extraction
14
My grandmom is watching Deliver Us From Evil and
I can hear this weird high-pitched voice and wa...
Groups of features
15
Feature group Features
Mentions of semantic classes mentions of each semantic class
Key phrases sent...
Classification scenario
• 401 labelled examples: 94 related; 307 unrelated
• Three different types of classification methods:...
Evaluation
17
Based on ten experiments of stratified 10-fold cross validation
Baseline features outperform only with SVM, d...
Contribution of features
18
Features F2-score
Mentions of semantic classes *	0.769	▼	
Key phrases *	0.788	▼
Part-of-speech...
Error analysis (highlights)
19
Text Predicted Actual
I do not hear voices, I am not
paranoid
✅

Related
❌

Unrelated
I’m h...
Generating dataset for analysis
1. Take best-performed classification model
2. Predict relatedness for unlabelled examples
...
Preliminary data analysis
21
Related
Unrelated
0 25 50 75 100
72%
19%
28%
81%• Negative sentiments
significantly associated...
Preliminary data analysis
21
Related
Unrelated
0 25 50 75 100
72%
19%
28%
81%• Negative sentiments
significantly associated...
Summary
• Experimental methodology to harvest and mine
datasets from unsolicited Twitter posts to identify
potential psych...
23
Questions?
Acknowledgements
Centre for Doctoral Training, School of Computer Science, University of Manchester
Health e...
Upcoming SlideShare
Loading in …5
×

Mining auditory hallucinations from unsolicited Twitter posts

101 views

Published on

LREC 2016 Workshop. Resources and Processing of Linguistic and Extra-Linguistic Data from People with Various Forms of Cognitive/Psychiatric Impairments (RaPID-2016)

Paper: https://www.researchgate.net/publication/304320133_Mining_Auditory_Hallucinations_from_Unsolicited_Twitter_Posts


Abstract:
Auditory hallucinations are common in people who experience psychosis and psychotic-like phenomena. This exploratory study aimed to establish the feasibility of harvesting and mining datasets from unsolicited Twitter posts to identify potential auditory hallucinations. To this end, several search queries were defined to collect posts from Twitter. A training sample was annotated by research psychologists for relatedness to auditory hallucinatory experiences and a text classifier was trained on that dataset to identify tweets related to auditory hallucinations. A number of features were used including sentiment polarity and mentions of specific semantic classes, such as fear expressions, communication tools and abusive language. We then used the classification model to generate a dataset with potential mentions of auditory hallucinatory experiences. A preliminary analysis of a dataset (N = 4957) revealed that posts linked to auditory hallucinations were associated with negative sentiments. In addition, such tweets had a higher proportionate distribution between the hours of 11pm and 5am in comparison to other tweets.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Mining auditory hallucinations from unsolicited Twitter posts

  1. 1. Mining auditory hallucinations from unsolicited Twitter
 posts M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research Portorož, May 2016
  2. 2. Mining auditory hallucinations from unsolicited Twitter
 posts schizophrenia hearing voices mental psychosissymptom sound health M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research Portorož, May 2016
  3. 3. Mining auditory hallucinations from unsolicited Twitter
 posts social network brief message fewer than 140 characters 310M active users share opinions spontaneous unforced unasked-for M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research Portorož, May 2016
  4. 4. Mining auditory hallucinations from unsolicited Twitter
 posts knowledge discovery exploratory patternsunseen data analysis M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research Portorož, May 2016
  5. 5. Mining auditory hallucinations from unsolicited Twitter
 posts schizophrenia hearing voices mental psychosissymptom sound health knowledge discovery patternsunseen social network brief message fewer than 140 characters 320M active usersshare opinions spontaneous unforced unasked-for M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research Portorož, May 2016
  6. 6. Research aim Q: Is it feasible to generate useful datasets from unsolicited Twitter posts regarding auditory hallucinatory experiences to support psychological investigations? 6
  7. 7. Research aim Q: Is it feasible to generate useful datasets from unsolicited Twitter posts regarding auditory hallucinatory experiences to support psychological investigations? 6 A: Classification model that can predict whether a given post is related to hallucinatory experiences.
  8. 8. Potentially related posts 7 I am hearing a scary voice right now, I don’t know if it’s in my head or in television.. Crazy All twitter posts were paraphrased to preserve anonymity ✅ If hallucinating is thought of as hearing voices that are not actually real, then these painkillers are causing me to hallucinate like mad ✅
  9. 9. Unrelated posts 8 My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me All twitter posts were paraphrased to preserve anonymity ❌ So I was convinced I was hearing stuff. It was so funny because the noise was coming from the kitchen but I thought I was hallucinating ❌
  10. 10. Iterative workflow 9 Define search queries Collect unique posts from Twitter Annotate posts & Explore data Predict relatedness of posts to hallucinatory experiences Analyse data Redefine search queries
  11. 11. Data collection 10 Search query hallucinating hearing (“hear things” OR “hearing things”) “in my head” hearing scary things “in my head” (hear OR hearing) (“other people” OR “other ppls” OR “other ppl”) thoughts (voice OR voices) (commenting OR criticising) (scary OR frightening OR “everything I do”) (hear OR hearing) (voice OR voices) (god OR angel OR allah OR spirit OR soul OR “holy spirit” OR djinn OR jinn) (hear OR hearing) (voice OR voices) (scary OR devil OR demon OR daemon OR evil OR “evil spirit”) List of defined search queries for Twitter Search API
  12. 12. Data annotation 11 • Two research psychologists manually annotated posts: • Assign classes: related or unrelated to hallucinations • Highlight specific phrases to describe their decisions • Later highlighted words and phrases were utilised to identify characteristics of each classification category Data annotation process RESULT: 401 annotated examples: 94 related to hallucinatory experiences • The observed IAA was 0.85 on 41 examples (10% of the final annotated set)
  13. 13. Data exploration: semantic classes 12 • Relative (father, friend) • Communication Tool (phone) • Audio Device
 (headphones, TV) • Drug (cannabis, painkillers) • Audio Recording (voicemail) • Possible Hallucination
 (seeing things, in my head) • Audio & Visual Media, Apps (song, YouTube, Siri) • Religious Term (prayer) • Emotional Support (helpline) • Own Voice Indicator 
 (my voice, our own voice) • Fear Expression (scared, creepy) • Abusive Language (sh*t, hell) • Stigmatising Language
 (crazy, insane)
  14. 14. Text classification pipeline 13 Im hearing a scary voice rn,idk if it’s in my head or in TV..craazy Information Extraction Classification Text Preprocessing corrected text structured text raw
 (unstructured) text structured text label label: related to hallucinatory experience I am hearing a scary voice right now, I don’t know if audio device it’s in my head or in television.. Crazy stigmatising lang. fear expr. possible hallucination O V V D A N R R O V V P AL P D N & P N POS tagset from Gimpel et al. (2011): O - personal pronoun, V - verb, D - determiner, etc.
  15. 15. Information extraction 14 My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me Neg. sentimentRelative [1] NE (person) [1] POS Tags NE (misc) [1] *Stanford NER using 4-class model trained on the CoNLL 2003 data *
  16. 16. Information extraction 14 My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me Neg. sentimentRelative [1] NE (person) [1] key phrase
 extraction POS Tags hear this weird high-pitched voice Neg. sentimentWeird / strange [1] POS Tags V D A A N NE (misc) [1] *Stanford NER using 4-class model trained on the CoNLL 2003 data *
  17. 17. Groups of features 15 Feature group Features Mentions of semantic classes mentions of each semantic class Key phrases sentiment polarity, sem. classes, POS tags Part-of-speech tags nouns, verbs, adjectives, etc. Sentiment polarity positive, negative or neutral Popularity of the post likes, retweets Use of nonstandard language spelling mistakes, abbreviations Number of Twitter entities URLs, #hashtags, @mentions Named entities persons, locations, organisations Lexical distribution sentences, words, characters
  18. 18. Classification scenario • 401 labelled examples: 94 related; 307 unrelated • Three different types of classification methods: • Naive Bayes (probabilistic model) • Support Vector Machine (geometric model) • AdaBoost (boosting of the tree model) • Compare performance with simple baseline: tf-idf features 16
  19. 19. Evaluation 17 Based on ten experiments of stratified 10-fold cross validation Baseline features outperform only with SVM, difference is non-significant (p-value=0.375) Classification performance of various classification methods on two different sets of features NB SVM AdaBoost F2-score 0 0.225 0.45 0.675 0.9 0.711 0.751 0.486 0.772 0.743 0.831 Proposed features Baseline features 🏆
  20. 20. Contribution of features 18 Features F2-score Mentions of semantic classes * 0.769 ▼ Key phrases * 0.788 ▼ Part-of-speech tags 0.817 ▼ Sentiment polarity * 0.818 ▼ Popularity of the post 0.828 ▼ Use of nonstandard language 0.831 ▬ Number of Twitter entities 0.832 ▲ Named entities 0.832 ▲ Lexical distribution 0.833 ▲ All features 0.831 ▲ * Statistically significant differences are marked with asterisk
  21. 21. Error analysis (highlights) 19 Text Predicted Actual I do not hear voices, I am not paranoid ✅
 Related ❌
 Unrelated I’m hallucinating I’m hearing hawks! Oh hang on, it is just the television ✅
 Related ❌
 Unrelated The voices which I hear every night tell me to do it ❌
 Unrelated ✅
 Related All twitter posts were paraphrased to preserve anonymity
  22. 22. Generating dataset for analysis 1. Take best-performed classification model 2. Predict relatedness for unlabelled examples 3. Combine with 401 labelled (annotated) examples RESULT: 4957 examples: 546 potentially related to hallucinatory experiences * 20 * e.g. Wiles et. al (2006) national survey only 62 cases identified
  23. 23. Preliminary data analysis 21 Related Unrelated 0 25 50 75 100 72% 19% 28% 81%• Negative sentiments significantly associated with posts that indicated the occurrence of auditory hallucinations
  24. 24. Preliminary data analysis 21 Related Unrelated 0 25 50 75 100 72% 19% 28% 81%• Negative sentiments significantly associated with posts that indicated the occurrence of auditory hallucinations • Posts linked to auditory hallucinations had a higher proportionate distribution
 between the hours of 11pm and 5am
  25. 25. Summary • Experimental methodology to harvest and mine datasets from unsolicited Twitter posts to identify potential psychotic(-like) experiences. • Classification model that can relatively accurate predict the relatedness of posts to auditory hallucinations • Preliminary data analysis that identified interesting patterns in sentiment polarity and posting time • Future research: investigate expressions of sleep in Twitter users’ who report a diagnosis of a psychosis- related disorder 22
  26. 26. 23 Questions? Acknowledgements Centre for Doctoral Training, School of Computer Science, University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research School of Psychological Sciences, University of Manchester

×