Your SlideShare is downloading. ×
0
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Sentiment analysis in healthcare
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sentiment analysis in healthcare

3,348

Published on

This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part …

This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part of her internship at UXLabs (http://uxlabs.co.uk)

Published in: Data & Analytics
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,348
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
51
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Sentiment Analysis in Healthcare A case study using survey responses
  • 2. Outline 1) Sentiment analysis & healthcare 2) Existing tools 3) Conclusions & Recommendations
  • 3. Focus on Healthcare 1) Difficult field – biomedical text 2) Potential improvements Relevant Research:  NLP procedure: FHF prediction (Roy et. al., 2013)  TPA: ‘Who is sick’, ‘Google Flu Trends’ (Maged et. al., 2010)  BioTeKS: analyse biomedical text (Mack et. al., 2004)
  • 4. Sentiment Analysis  Opinions  Thoughts  Feelings  Used to extract information from raw data
  • 5. Sentiment Analysis – Examples  Surveys: analyse open-ended questions  Business & Governments: assist in the decision-making process & monitor negative communication  Consumer feedback: analyse reviews  Health: analyse biomedical text
  • 6. Aims & Objectives  Can existing Sentiment Analysis tools respond to the needs of any healthcare- related matter?  Is it possible to accurate replicate human language using machines?
  • 7. The case study details  8 survey questions (open & close-ended)  Analysed 137 responses based on the question: “What is your feedback?”  Commercial tools: Semantria & TheySay  Non-commercial tools: Google Predication API & WEKA
  • 8. Survey Overview 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1: navigation Q.2: finding information Q.3: website's appeal Q.6: satisfaction Q.8: recommend website
  • 9. Semantria  Collection Analysis  Categories  Classification Analysis  Entity Recognition
  • 10. TheySay  Document Sentiment  Sentence Sentiment  POS  Comparison Detection  Humour Detection  Speculation Analysis  Risk Analysis  Intent Analysis
  • 11. Commercial Tools – Results 39 51 47 Semantria Positive Neutral Negative 45 8 84 TheySay Positive Neutral Negative
  • 12. Introducing a Baseline 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1 Q.2 Q.3 Q.6 Q.8 Neutral Classification Guidelines Equally positive & negative Factual statements Irrelevant statements Class Score Range Positive 1 – 2.7 Neutral 2.8 – 4.2 Negative 4.3 - 5
  • 13. Introducing a Baseline Example Polarity Class “CG 102 not available” Hence: Negative Neutral Classification But  Factual Statement  Positive or negative? Final label: Neutral Q.1 Q.2 Q.3 Q.6 Q.8 Avg. 3 5 4 5 5 4.4
  • 14. Introducing a Baseline 24 18 95 Manually Classified Responses Positive Neutral Negative
  • 15. Google Prediction API 1) Pre-process the data: punctuation & capital removal, account for negation 2) Separate into training and testing sets 3) Insert pre-labelled data 4) Train model 5) Test model 6) Cross validation: 4-fold 7) Compare with baseline
  • 16. Google Prediction API – Results 5 122 10 Classification Results Neutral Negative Positive
  • 17. WEKA 1) Separate into training and testing sets 2) Choose graphical user interface: “The Explorer” 3) Insert pre-labelled data 4) Pre-process the data: punctuation, capital & stopwords removal and alphabetically tokenize
  • 18. WEKA 5) Consider resampling: whether a balanced dataset is preferred 6) Choose classifier: “Naïve Bayes” 7) Classify using cross validation: 4-fold
  • 19. WEKA – Results  Resampling: 10% increase in precision 6% increase in accuracy  Overall, 82% correctly classified
  • 20. The tools  Semantria: range between -2 and 2  TheySay: three percentages for negative, positive & neutral  Google Prediction API: three values for negative, positive & neutral  WEKA: percentage of correctly classified
  • 21. Evaluation Tool Accuracy Commercial Tools Semantria 51.09% TheySay 68.61% Non-Commercial Tools Google Prediction API 72.25% WEKA 82.35%
  • 22. Evaluation Tool Kappa statistic F-measure Semantria 0.2692 0.550 TheySay 0.3886 0.678 Google Prediction API 0.2199 0.628 WEKA 0.5735 0.809
  • 23. Evaluation
  • 24. Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive PrecisionValue Class Comparison of Precision Semantria TheySay Google API WEKA
  • 25. Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive RecallValue Class Comparison of Recall Semantria TheySay Google API WEKA
  • 26. Evaluation: Single-sentence responses Tool Accuracy based on correct classification All responses Single- sentence Responses Commercial Tools Semantria 51.09% 53.49% TheySay 68.61% 72.09% Non-Commercial Tools Google Prediction API 72.25% 54% WEKA 82.35% 70%
  • 27. Conclusions  Semantria: business use  TheySay: prepare for competition & academic research  Google Prediction API: classification  WEKA: extraction & classification in healthcare
  • 28. Conclusions  Commercial tools: easy to use and provide results quickly  Non-commercial tools: time-consuming but more reliable
  • 29. Conclusions Is it possible to accurate replicate human language using machines?  Approx. 70% accuracy for all tools (except Semantria)  WEKA: most powerful tool
  • 30. Conclusions Can existing SA tools respond to the needs of any healthcare-related matter?  Commercial tools can not respond  Non-commercial can be trained
  • 31. Limitations  Only four tools  Small dataset  Potential errors in manual classification  Detailed analysis of single-sentence responses was omitted
  • 32. Recommendations  Examine reliability of other commercial tools  Investigate other non-commercial tools, especially NLTK and GATE  Examine other classifiers (SVM & MaxEnt)  Investigate all WEKA’s GUI
  • 33. Recommendations  Verify labels using more people  Label sentence as well as the whole response  Negativity associated with long reviews
  • 34. Questions

×