Your SlideShare is downloading. ×
0
Sentiment
Analysis in
Healthcare
A case study using survey
responses
Outline
1) Sentiment analysis & healthcare
2) Existing tools
3) Conclusions & Recommendations
Focus on Healthcare
1) Difficult field – biomedical text
2) Potential improvements
Relevant Research:
 NLP procedure: FHF...
Sentiment Analysis
 Opinions
 Thoughts
 Feelings
 Used to extract information from raw data
Sentiment Analysis – Examples
 Surveys: analyse open-ended questions
 Business & Governments: assist in the
decision-mak...
Aims & Objectives
 Can existing Sentiment Analysis tools
respond to the needs of any healthcare-
related matter?
 Is it ...
The case study details
 8 survey questions (open & close-ended)
 Analysed 137 responses based on the
question: “What is ...
Survey Overview
0
20
40
60
80
100
1 2 3 4 5
NumberofResponses
Score
Q.1: navigation Q.2: finding information
Q.3: website'...
Semantria
 Collection Analysis
 Categories
 Classification Analysis
 Entity Recognition
TheySay
 Document Sentiment
 Sentence Sentiment
 POS
 Comparison
Detection
 Humour Detection
 Speculation Analysis
...
Commercial Tools – Results
39
51
47
Semantria
Positive Neutral Negative
45
8
84
TheySay
Positive Neutral Negative
Introducing a Baseline
0
20
40
60
80
100
1 2 3 4 5
NumberofResponses
Score
Q.1 Q.2 Q.3 Q.6 Q.8
Neutral Classification Guid...
Introducing a Baseline
Example
Polarity Class
“CG 102 not
available”
Hence: Negative
Neutral Classification
But
 Factual ...
Introducing a Baseline
24
18
95
Manually Classified Responses
Positive Neutral Negative
Google Prediction API
1) Pre-process the data:
punctuation & capital removal,
account for negation
2) Separate into traini...
Google Prediction API – Results
5
122
10
Classification Results
Neutral Negative Positive
WEKA
1) Separate into training and testing sets
2) Choose graphical user interface: “The
Explorer”
3) Insert pre-labelled ...
WEKA
5) Consider resampling:
whether a balanced dataset is
preferred
6) Choose classifier: “Naïve Bayes”
7) Classify using...
WEKA – Results
 Resampling:
10% increase in precision
6% increase in accuracy
 Overall, 82% correctly classified
The tools
 Semantria: range between -2 and 2
 TheySay: three percentages for negative,
positive & neutral
 Google Predi...
Evaluation
Tool Accuracy
Commercial Tools
Semantria 51.09%
TheySay 68.61%
Non-Commercial Tools
Google Prediction API 72.25...
Evaluation
Tool Kappa statistic F-measure
Semantria 0.2692 0.550
TheySay 0.3886 0.678
Google Prediction
API
0.2199 0.628
W...
Evaluation
Evaluation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Negative Neutral Positive
PrecisionValue
Class
Comparison of Precision
...
Evaluation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Negative Neutral Positive
RecallValue
Class
Comparison of Recall
Semant...
Evaluation:
Single-sentence responses
Tool
Accuracy based on
correct classification
All
responses
Single-
sentence
Respons...
Conclusions
 Semantria: business use
 TheySay: prepare for competition &
academic research
 Google Prediction API: clas...
Conclusions
 Commercial tools:
easy to use and provide results quickly
 Non-commercial tools:
time-consuming but more re...
Conclusions
Is it possible to accurate replicate human
language using machines?
 Approx. 70% accuracy for all tools
(exce...
Conclusions
Can existing SA tools respond to the needs
of any healthcare-related matter?
 Commercial tools can not respon...
Limitations
 Only four tools
 Small dataset
 Potential errors in manual classification
 Detailed analysis of single-se...
Recommendations
 Examine reliability of other commercial
tools
 Investigate other non-commercial tools,
especially NLTK ...
Recommendations
 Verify labels using more people
 Label sentence as well as the whole
response
 Negativity associated w...
Questions
Upcoming SlideShare
Loading in...5
×

Sentiment analysis in healthcare

3,441

Published on

This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part of her internship at UXLabs (http://uxlabs.co.uk)

Published in: Data & Analytics
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,441
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
51
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Sentiment analysis in healthcare"

  1. 1. Sentiment Analysis in Healthcare A case study using survey responses
  2. 2. Outline 1) Sentiment analysis & healthcare 2) Existing tools 3) Conclusions & Recommendations
  3. 3. Focus on Healthcare 1) Difficult field – biomedical text 2) Potential improvements Relevant Research:  NLP procedure: FHF prediction (Roy et. al., 2013)  TPA: ‘Who is sick’, ‘Google Flu Trends’ (Maged et. al., 2010)  BioTeKS: analyse biomedical text (Mack et. al., 2004)
  4. 4. Sentiment Analysis  Opinions  Thoughts  Feelings  Used to extract information from raw data
  5. 5. Sentiment Analysis – Examples  Surveys: analyse open-ended questions  Business & Governments: assist in the decision-making process & monitor negative communication  Consumer feedback: analyse reviews  Health: analyse biomedical text
  6. 6. Aims & Objectives  Can existing Sentiment Analysis tools respond to the needs of any healthcare- related matter?  Is it possible to accurate replicate human language using machines?
  7. 7. The case study details  8 survey questions (open & close-ended)  Analysed 137 responses based on the question: “What is your feedback?”  Commercial tools: Semantria & TheySay  Non-commercial tools: Google Predication API & WEKA
  8. 8. Survey Overview 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1: navigation Q.2: finding information Q.3: website's appeal Q.6: satisfaction Q.8: recommend website
  9. 9. Semantria  Collection Analysis  Categories  Classification Analysis  Entity Recognition
  10. 10. TheySay  Document Sentiment  Sentence Sentiment  POS  Comparison Detection  Humour Detection  Speculation Analysis  Risk Analysis  Intent Analysis
  11. 11. Commercial Tools – Results 39 51 47 Semantria Positive Neutral Negative 45 8 84 TheySay Positive Neutral Negative
  12. 12. Introducing a Baseline 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1 Q.2 Q.3 Q.6 Q.8 Neutral Classification Guidelines Equally positive & negative Factual statements Irrelevant statements Class Score Range Positive 1 – 2.7 Neutral 2.8 – 4.2 Negative 4.3 - 5
  13. 13. Introducing a Baseline Example Polarity Class “CG 102 not available” Hence: Negative Neutral Classification But  Factual Statement  Positive or negative? Final label: Neutral Q.1 Q.2 Q.3 Q.6 Q.8 Avg. 3 5 4 5 5 4.4
  14. 14. Introducing a Baseline 24 18 95 Manually Classified Responses Positive Neutral Negative
  15. 15. Google Prediction API 1) Pre-process the data: punctuation & capital removal, account for negation 2) Separate into training and testing sets 3) Insert pre-labelled data 4) Train model 5) Test model 6) Cross validation: 4-fold 7) Compare with baseline
  16. 16. Google Prediction API – Results 5 122 10 Classification Results Neutral Negative Positive
  17. 17. WEKA 1) Separate into training and testing sets 2) Choose graphical user interface: “The Explorer” 3) Insert pre-labelled data 4) Pre-process the data: punctuation, capital & stopwords removal and alphabetically tokenize
  18. 18. WEKA 5) Consider resampling: whether a balanced dataset is preferred 6) Choose classifier: “Naïve Bayes” 7) Classify using cross validation: 4-fold
  19. 19. WEKA – Results  Resampling: 10% increase in precision 6% increase in accuracy  Overall, 82% correctly classified
  20. 20. The tools  Semantria: range between -2 and 2  TheySay: three percentages for negative, positive & neutral  Google Prediction API: three values for negative, positive & neutral  WEKA: percentage of correctly classified
  21. 21. Evaluation Tool Accuracy Commercial Tools Semantria 51.09% TheySay 68.61% Non-Commercial Tools Google Prediction API 72.25% WEKA 82.35%
  22. 22. Evaluation Tool Kappa statistic F-measure Semantria 0.2692 0.550 TheySay 0.3886 0.678 Google Prediction API 0.2199 0.628 WEKA 0.5735 0.809
  23. 23. Evaluation
  24. 24. Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive PrecisionValue Class Comparison of Precision Semantria TheySay Google API WEKA
  25. 25. Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive RecallValue Class Comparison of Recall Semantria TheySay Google API WEKA
  26. 26. Evaluation: Single-sentence responses Tool Accuracy based on correct classification All responses Single- sentence Responses Commercial Tools Semantria 51.09% 53.49% TheySay 68.61% 72.09% Non-Commercial Tools Google Prediction API 72.25% 54% WEKA 82.35% 70%
  27. 27. Conclusions  Semantria: business use  TheySay: prepare for competition & academic research  Google Prediction API: classification  WEKA: extraction & classification in healthcare
  28. 28. Conclusions  Commercial tools: easy to use and provide results quickly  Non-commercial tools: time-consuming but more reliable
  29. 29. Conclusions Is it possible to accurate replicate human language using machines?  Approx. 70% accuracy for all tools (except Semantria)  WEKA: most powerful tool
  30. 30. Conclusions Can existing SA tools respond to the needs of any healthcare-related matter?  Commercial tools can not respond  Non-commercial can be trained
  31. 31. Limitations  Only four tools  Small dataset  Potential errors in manual classification  Detailed analysis of single-sentence responses was omitted
  32. 32. Recommendations  Examine reliability of other commercial tools  Investigate other non-commercial tools, especially NLTK and GATE  Examine other classifiers (SVM & MaxEnt)  Investigate all WEKA’s GUI
  33. 33. Recommendations  Verify labels using more people  Label sentence as well as the whole response  Negativity associated with long reviews
  34. 34. Questions
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×