Sentiment analysis in healthcare
Upcoming SlideShare
Loading in...5
×
 

Sentiment analysis in healthcare

on

  • 1,465 views

This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part ...

This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part of her internship at UXLabs (http://uxlabs.co.uk)

Statistics

Views

Total Views
1,465
Views on SlideShare
241
Embed Views
1,224

Actions

Likes
2
Downloads
16
Comments
0

12 Embeds 1,224

http://isquared.wordpress.com 1122
http://newsgator.cgsh.com 58
http://feedly.com 23
https://twitter.com 5
http://www.slideee.com 5
https://isquared.wordpress.com 3
https://feedly.com 2
http://reader.aol.com 2
http://news.google.com 1
http://digg.com 1
http://www.inoreader.com 1
http://wordpress.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sentiment analysis in healthcare Sentiment analysis in healthcare Presentation Transcript

  • Sentiment Analysis in Healthcare A case study using survey responses
  • Outline 1) Sentiment analysis & healthcare 2) Existing tools 3) Conclusions & Recommendations
  • Focus on Healthcare 1) Difficult field – biomedical text 2) Potential improvements Relevant Research:  NLP procedure: FHF prediction (Roy et. al., 2013)  TPA: ‘Who is sick’, ‘Google Flu Trends’ (Maged et. al., 2010)  BioTeKS: analyse biomedical text (Mack et. al., 2004)
  • Sentiment Analysis  Opinions  Thoughts  Feelings  Used to extract information from raw data
  • Sentiment Analysis – Examples  Surveys: analyse open-ended questions  Business & Governments: assist in the decision-making process & monitor negative communication  Consumer feedback: analyse reviews  Health: analyse biomedical text
  • Aims & Objectives  Can existing Sentiment Analysis tools respond to the needs of any healthcare- related matter?  Is it possible to accurate replicate human language using machines?
  • The case study details  8 survey questions (open & close-ended)  Analysed 137 responses based on the question: “What is your feedback?”  Commercial tools: Semantria & TheySay  Non-commercial tools: Google Predication API & WEKA
  • Survey Overview 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1: navigation Q.2: finding information Q.3: website's appeal Q.6: satisfaction Q.8: recommend website
  • Semantria  Collection Analysis  Categories  Classification Analysis  Entity Recognition
  • TheySay  Document Sentiment  Sentence Sentiment  POS  Comparison Detection  Humour Detection  Speculation Analysis  Risk Analysis  Intent Analysis
  • Commercial Tools – Results 39 51 47 Semantria Positive Neutral Negative 45 8 84 TheySay Positive Neutral Negative
  • Introducing a Baseline 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1 Q.2 Q.3 Q.6 Q.8 Neutral Classification Guidelines Equally positive & negative Factual statements Irrelevant statements Class Score Range Positive 1 – 2.7 Neutral 2.8 – 4.2 Negative 4.3 - 5
  • Introducing a Baseline Example Polarity Class “CG 102 not available” Hence: Negative Neutral Classification But  Factual Statement  Positive or negative? Final label: Neutral Q.1 Q.2 Q.3 Q.6 Q.8 Avg. 3 5 4 5 5 4.4
  • Introducing a Baseline 24 18 95 Manually Classified Responses Positive Neutral Negative
  • Google Prediction API 1) Pre-process the data: punctuation & capital removal, account for negation 2) Separate into training and testing sets 3) Insert pre-labelled data 4) Train model 5) Test model 6) Cross validation: 4-fold 7) Compare with baseline
  • Google Prediction API – Results 5 122 10 Classification Results Neutral Negative Positive
  • WEKA 1) Separate into training and testing sets 2) Choose graphical user interface: “The Explorer” 3) Insert pre-labelled data 4) Pre-process the data: punctuation, capital & stopwords removal and alphabetically tokenize
  • WEKA 5) Consider resampling: whether a balanced dataset is preferred 6) Choose classifier: “Naïve Bayes” 7) Classify using cross validation: 4-fold
  • WEKA – Results  Resampling: 10% increase in precision 6% increase in accuracy  Overall, 82% correctly classified
  • The tools  Semantria: range between -2 and 2  TheySay: three percentages for negative, positive & neutral  Google Prediction API: three values for negative, positive & neutral  WEKA: percentage of correctly classified
  • Evaluation Tool Accuracy Commercial Tools Semantria 51.09% TheySay 68.61% Non-Commercial Tools Google Prediction API 72.25% WEKA 82.35%
  • Evaluation Tool Kappa statistic F-measure Semantria 0.2692 0.550 TheySay 0.3886 0.678 Google Prediction API 0.2199 0.628 WEKA 0.5735 0.809
  • Evaluation
  • Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive PrecisionValue Class Comparison of Precision Semantria TheySay Google API WEKA
  • Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive RecallValue Class Comparison of Recall Semantria TheySay Google API WEKA
  • Evaluation: Single-sentence responses Tool Accuracy based on correct classification All responses Single- sentence Responses Commercial Tools Semantria 51.09% 53.49% TheySay 68.61% 72.09% Non-Commercial Tools Google Prediction API 72.25% 54% WEKA 82.35% 70%
  • Conclusions  Semantria: business use  TheySay: prepare for competition & academic research  Google Prediction API: classification  WEKA: extraction & classification in healthcare
  • Conclusions  Commercial tools: easy to use and provide results quickly  Non-commercial tools: time-consuming but more reliable
  • Conclusions Is it possible to accurate replicate human language using machines?  Approx. 70% accuracy for all tools (except Semantria)  WEKA: most powerful tool
  • Conclusions Can existing SA tools respond to the needs of any healthcare-related matter?  Commercial tools can not respond  Non-commercial can be trained
  • Limitations  Only four tools  Small dataset  Potential errors in manual classification  Detailed analysis of single-sentence responses was omitted
  • Recommendations  Examine reliability of other commercial tools  Investigate other non-commercial tools, especially NLTK and GATE  Examine other classifiers (SVM & MaxEnt)  Investigate all WEKA’s GUI
  • Recommendations  Verify labels using more people  Label sentence as well as the whole response  Negativity associated with long reviews
  • Questions