Using Twitter Data to Predict Flu Outbreak

644 views
540 views

Published on

Using Twitter Data to Predict Flu Outbreak

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
644
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Twitter Data to Predict Flu Outbreak

  1. 1. Using  Twi)er  Data  to  Predict  Flu  Outbreak   Son  Doan   Division  of  Biomedical  Informa2cs   University  of  California  San  Diego     BigData@UCSD  workshop   Nov  25,  2013  
  2. 2. Seasonal  influenza  and  influenza-­‐like  illness   •  Seasonal  influenza  is  a  major  public  health  concern:   •  3-­‐5  million  cases  of  severe  illness     •  250,000  to  500,000  deaths  worldwide    each  year     •  Seasonal  influenza  has  main  syndrome  called  Influenza-­‐ Like  Illness  (ILI)   •  During  the  peak  of  a  major  outbreak  of  influenza,  more   cases  of  ILI  are  observed   à  Monitoring  ILI  can  help  in  predict  flu  outbreak        
  3. 3. Tradi?onal  system  to  monitor  ILI:  ILINet     •  ILINet:  CDC’s  U.S.  Outpa2ent  ILI  Surveillance   Network   –  consists  of    >3,000  outpa2ent  healthcare  providers     –  all  50  US  states  and  area   –  reports  more  than  30  million  pa2ent  visits  each  year   •  ILINet  monitors  influenza  through  ILI  rate     –  ILI  rate  is  percentage  of  pa2ents  with  ILI  among  all   pa2ents   –  Average  na2onal  baseline  ILI  rate  for  2013  is  2.0%  
  4. 4. Source:  hVp://www.cdc.gov/flu/weekly/index.htm  
  5. 5. Let’s  revisit  the  process        Pa2ent  1        Pa2ent  2   visits   Healthcare   provider   Check  if  ILI   visits   Healthcare   provider   Check  if  ILI   Healthcare   provider   Check  if  ILI   …        Pa2ent  n   visits   ILINet  gather   data  and  then   calculate  ILI  rate  
  6. 6. ILINet  issue   ILINet  needs  1-­‐2  weeks  to  gather  and  process   data   Can  we  leverage  other  data  sources   to  predict  ILI  rate  faster?  
  7. 7. Nowadays,  users  tend  to  find  informa?on  in  Internet      User  1        User  2   searches   searches   Internet   …        User  n   searches  
  8. 8. …  or  tweet  their  personal  health  condi?ons      User  1        User  2   tweets   tweets   Internet   …        User  n   tweets  
  9. 9. Es?mate  ILI  rate  using  user-­‐generated  data     •  Models   –  Linear  model  [1]:   ILI  rate  =  (ILI-­‐related  data)Ÿα  +  error   –  Logis2c  regression  [2]:     logit(ILI  rate)  =  logit(ILI-­‐related  data)Ÿα  +  error     •  Key  point:  How  to  iden2fy  ILI-­‐related  data?     •  Hint:  ILI  is  defined  as  fever  (temperature  of  100°F   [37.8°C]  or  greater)  and  cough  and/or  sore  throat   [1]  Polgreen  et  al.  “Using  internet  searches  for  influenza  surveillance”,  Clinical  Infec2ous  Disease,   2008,  47(11):1443-­‐8.   [2]  Ginsberg  et  al.  “Detec?ng  influenza  epidemics  using  search  engine  query  data.”,  Nature.  2009   Feb  19;457(7232):1012-­‐4  
  10. 10. GFT  es?mates  based  on  flu-­‐related  queries   are  highly  correlated  to  ILI  rate   Repor2ng  lag  of  about  1  day   Source:  hVp://www.google.org/flutrends/about/how.html    
  11. 11. GFT  is  good,  however…   •  Researchers  cannot  access  original  data   •  GFT  does  not  disclose  search  queries   Source:  Ginsberg  et  al,  Nature  457,  1012-­‐1014  (19  February  2009)  
  12. 12. SOURCES:  GOOGLE  FLU  TRENDS   (WWW.GOOGLE.ORG/FLUTRENDS);   CDC;  FLU  NEAR  YOU  
  13. 13. Twi)er  corpus   Timeline:  36  weeks  for  the  US  2009  influenza  season   (Aug  30,  2009  to  May  8,  2010)     Name   Total   25 mil Tweets   587,290,394   Unique   23,571,765   users     URL   136,034,309   Hash   Tags   20 mil 15 mil 10 mil 96,399,587   5 mil Thanks  to  Brendan  O’Connor  (CMU)  and  TwiVer  Inc.  
  14. 14. Related  work   Twi)er   corpus   ILI-­‐related   tweets   Culo)a4   Signorini3   Chew3   flu   swine   h1n1   cough   flu   swine  flu   headache   influenza   swineflu   sore  throat   [3]  A.  CuloVa,  “Detec2ng  influenza  epidemics  by  analyzing  twiVer  messages,”  arXiv:1007.4748v1   [4]  A.  Signorini,  A.  M.  Segre,  and  P.  M.  Polgreen,  “The  Use  of  TwiVer  to  Track  Levels  of  Disease  Ac2vity  and  Public  Concern  in  the  U.S.  during   the  Influenza  A  H1N1  Pandemic,”  PLoS  ONE,  vol.  6,  no.  5,  p.  e19467,  05  2011.     [5]  C.  Chew  and  G.  Eysenbach,  “Pandemics  in  the  Age  of  TwiVer:  Content  Analysis  of  Tweets  during  the  2009  H1N1  Outbreak,”  PLoS  ONE,   vol.  5,  no.  11,  p.  e14118,  11  2010.  
  15. 15. Our  approach:  two-­‐step  filtering   Twi)er   corpus   Respiratory   syndrome-­‐related   tweets   Filter  1   Knowledge-­‐ based  approach   Respiratory  syndrome   only   Respirator  syndrome    +   “flu”   Respiratory  syndrome    +   “flu”  -­‐  URL   Seman?c  filtered   tweets   Filter  2   Seman?c  level       Nega?on   Emo?con   HashTags   Humor   Geo  
  16. 16. Correla?on  to  ILI  rate  (CDC  data)   Method   Google  Flu  Trends   Pearson  corr   with  ILI  rate   0.9912   Related  work   CuloVa4   0.9485   Filter  1   Respiratory  syndrome  +  “flu”  -­‐  URL   0.9752   Filter  1+2   Nega2on  +  Emo2con  +  HashTags  +   Humor  +  Geo   0.9846  
  17. 17. % Correla?on  to  ILI  rate  (CDC  data)   S.  Doan,  L.Ohno-­‐Machado,  N.  Collier,  "Enhancing  TwiVer  Data  Analysis  with  Simple  Seman2c  Filtering:  Example  in  Tracking  Influenza-­‐  Like   Illnesses",  Proc.  of  the  2nd  IEEE  HISB  2012,  pp.62-­‐71,  2012.  
  18. 18. Big  Data  challenge   Twi)er:  140  millions  ac?ve  users   340  millions  tweets/day   Twitter API sampling rate is small (1-5% data) Filtered tweets: 0.2% of samples Is  sampling   data    enough?  
  19. 19. DIZIE:  system  for  syndromic  surveillance  using  Twi)er   Syndromic  surveillance  for  gastrointes?nal,   respiratory,  neurological,  dermatological,   haemorrhagic,  musculoskeletal  from  Tweets  in  40   world  ci2es.  
  20. 20. Use  cases   •  DIZIE  was  integrated  to  BioCaster,  our  news  media   biosurveillance  system   •  DIZIE  was  used  by  European  Centre  for  Disease   Preven2on  and  Control  (ECDC)  to  track  syndromes  in   the  London  2012  Summer  Olympics  
  21. 21. Poten?al  applica?ons  using  Twi)er  in  public  health   •  Mental  Heath  Analysis   •  Tobacco  surveillance   •  Medica2on  use  in  social  media  
  22. 22. Acknowledgements   •  Nigel  Collier,  European  Bioinforma2cs  Ins2tute   •  Mike  Conway,  UCSD   •  Lucila  Ohno-­‐Machado,  UCSD  
  23. 23. Data  source  for  influenza  surveillance   •  •  •  •  •  Data  provided  by  physicians  and  laboratory   Over-­‐the-­‐counter-­‐drug  sales   School  absentee  records   Health-­‐related  phone  calls   Internet-­‐based  data:   –  News  media   –  Mailing  list   –  Social  media  
  24. 24. Extract  respiratory  syndrome  keywords   achy  chest   cold  symptom   respiratory  failure   apnea   cough   runny  nose   asthma   dyspnea   short  of  breath     asthma?c   dyspnoea   shortness  of  breath   blocked  nose   gasping  for  air   sinusi?s   breathing  difficul?es   lung  sounds   sore  throat   breathing  trouble   pneumonia   stop  breathing   bronchi?s   rales   stuffy  nose   …   …   …   We  have  a  total  of  37  keywords    
  25. 25. Knowledge-­‐based  approach   Name   Example   Respiratory  syndrome   only   tweets  containing   syndrome  keywords   Barber just coughed on me in the chair. Respiratory  syndrome   +  “flu”   tweets  containing   syndrome  keywords  and   “flu”     I got flu n coughed a lot. Respiratory    syndrome   tweets  containing   +  “flu”  -­‐  URL   syndrome  keywords  and   “flu”,  remove  links     7-year-old boy dies of flu,pneumonia < URL>
  26. 26. Seman2c  level  filtering   Name   Examples   Nega?on   Remove  nega?on  in  tweets   I don’t have flu Emo?con   Remove  tweets  containing   smiley  emo?cons,   e.g.,  :-­‐),,:D     Glad to hear that you’re beating the flu. :-) Hope you don’t get the nasty cough that everyone’s getting this year HashTags   Keeps  tweets  containing   keyword  “flu”   Still coughing smh #swineflu #h1n1 Humor   Remove  humor  features  in   tweets,  e.g.,  “haha”,”hihi”,   “***cough  …  cough***”   Hm Im kinda wanting to go to NYC really soon ***cough … cough*** @Ctmomofsix =) Geo   Tweets  from  graphical   loca?ons  (e.g.,  US)  
  27. 27. Seman2c-­‐level  filtered  tweets   Types   Tweet  samples   Influenza  confirma?on   I got flu n coughed a lot. Now my voice is like monster’s voice. Rrr Influenza  symptoms   My day: flu-like symptoms (headache, body aches, cough, chills, 100.9 fever). Swine flu not ruled out. #H1N1 Flu  shots   I’m still getting flu shots, nothing is worth flu turning into bronchitis into pneumonia Self  protec?on   Cover your mouth if coughing, use a tissue, wash your hands often & get a flu shot - protect and defend your community from #H1N1 Medica?on   Wondering why I didn’t take the flu shot, laying in bed with cough drops, medicine, and the remote

×