Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Committee:
Amit Sheth (Advisor)
T. K. Prasad
Michael Raymer
Jyotishman Pathak
(Cornell University)
Ph.D. Dissertation Defe...
2	
  
3.5 billion Web searches per day
3	
  
3.5 billion Web searches per day
One of the key aspects in building an intelligent search
engine is to understand us...
4	
  
Search Intent Mining !
Applications
Search Result Ranking
5	
  
Search Result Diversification
Search Personalization
Search Ads
Web Search Intent
6	
  
Search intent is a significant object/topic that
represents abstraction of users’ information need...
7	
  
Search Intent
Mining
Search
Goals
Search Topics
Session dataClick-through Query log
Manual
Unsupervised
Supervised
O...
8	
  
Search is shifting toward understanding intent
and serving objects
-Li et al., ACL, 2010
9	
  
10	
  
Health MovieSports TechnologyPhysics
Health
Diseases
Symptoms
Causes
Medications
Treatments
Prevention
 
	
  
Web Search for Health Information
Among all topics available on the
Internet, health is one of the most
important i...
•  Major Challenges
Ø  Consumers’ lack of
medical knowledge to
formulate health search
queries
Ø  Search engines’ failur...
•  Health search intent mining applications:
–  Personalized health information interventions
–  To get better understandi...
14	
  
Thesis Statement
Rich background knowledge from biomedical knowledge
bases and Wikipedia enables development of eff...
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent cla...
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent cla...
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent cla...
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent cla...
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent cla...
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent cla...
•  Allows the instances to be associated with more than one
class
•  Problem transformation methods (fit data to algorithm...
•  Manual, time consuming and labor intensive process
•  May require domain experts
•  Limited coverage
–  Training data s...
In the context of health search intent mining problem
•  Training data for 14 intent classes
•  Need domain experts to lab...
24	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
25	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
Ontolog...
26	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
Ontology
27	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
Unified Medical Language System	
  
•  UMLS (Unified Medical Language System)
–  Collection of over 100 controlled vocabul...
•  Concept identification consists of two primary tasks:
–  Concept recognition and concept mapping
–  Example : what are ...
•  Medical concept identification tools
–  UMLS MetaMap, cTAKES, MedLEE, NCBO Annotator
•  UMLS MetaMap
–  Identifies ULMS...
•  Phrase query: water on the brain
–  Water (Drinking Water) [Substance]
–  Brain (Brain) [Body Part, Organ, or Organ Com...
Concept Identification Approach
32	
  
•  Advanced text analytics
–  Word Sense Disambiguation (WSD)
•  Process of identify...
•  Consumer Health Vocabulary (CHV)
–  Maps terms used by layman to medical terms
–  E.g. hair loss => Alopecia
•  Problem...
•  Consumer Health Vocabulary (CHV)
–  Maps terms used by layman to medical terms
–  E.g. hair loss => Alopecia
•  Problem...
•  Traditional approach
–  Identification of consumer-oriented terms from Medline search
log, PatientsLikeMe forum data
– ...
•  Traditional approach
–  Identification of consumer-oriented terms from Medline search
log, PatientsLikeMe forum data
– ...
•  Wikipedia: Crowd sourced encyclopedia
Consumer Health Vocabulary Generation
37	
  
•  Wikipedia: Crowd sourced encyclopedia
Consumer Health Vocabulary Generation
38	
  
•  Wikipedia: Crowd sourced encyclopedia
Consumer Health Vocabulary Generation
39	
  
Health-related
Wikipedia
articles
He...
Snippet 2: Knee effusion or swelling of the knee (colloquially
known as water on the knee) occurs when excess synovial
flu...
41	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
Snippet 2: Knee effusion or swelling of...
42	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
Pairs Terms
Semantic
Relationship
Terms...
43	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
Wikipedia Patterns
also called commonly...
44	
  
Consumer Health Vocabulary Generation
Step 3: Identification of CHV and medical terms from the
candidate pairs
Map ...
•  Data:
–  Cardiovascular disease (CVD) related search queries
–  Limited to the United States
•  Data timeframe:
–  Sept...
•  Preprocessing
–  Stop word removal
–  Misspelling correction (using Hunspell spell checker)
•  Dictionaries: Hunspell d...
•  Preprocessing
–  Stop word removal
–  Misspelling correction (using Hunspell spell checker)
•  Dictionaries: Hunspell d...
•  Gold standard dataset
–  Two domain experts annotated randomly selected search queries
by labeling one search query wit...
•  Search Query Annotation
–  UMLS concepts and semantic types
•  Classification Rules
49	
  
Classification: Annotation an...
50	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+S...
51	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+S...
52	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+S...
53	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+S...
54	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+S...
55	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.66...
•  Macro Average
–  Precision:0.8842, Recall: 0.8607 and F-Score: 0.8723
56	
  
Classification : Evaluation Results
To chec...
No Intent Classes Total Queries
Percentage
Distribution
1 Diseases 4,232,398 40.66
2 Vital signs 3,455,809 33.20
3 Symptom...
8%	
  
48%	
  
40%	
  
4%	
  
0%	
  
Distribution of search queries by number of intent
classes in which they are classifi...
Dataset Precision Recall F1-Score
Cardiovascular
Diseases
0.8842 0.8642 0.8723
Diabetes 0.9274 0.8964 0.9116
Cancer 0.8294...
Personalized eHealth Interventions
60
Application
61
•  Hello,
For the past 10 hours I've been expierencing a semi sharp pain in
my upper right chest just below my armpit. ...
62
Demographic
Information
dry mouth => Xerostomia
Drugs and Medication
Misspellings
Diseases and
Conditions
Symptom
Consu...
•  Primary Symptom
–  Chest pain
•  upper side
•  Right side
•  Other symptoms
–  Stomach ache
–  Dry mouth
•  Current dis...
•  Primary Symptom
–  Chest pain
•  upper side
•  Right side
•  Other symptoms
–  Stomach ache
–  Dry mouth
•  Current dis...
65	
  
Thesis Statement
Rich background knowledge from biomedical knowledge
bases and Wikipedia enables development of eff...
•  Intentional information seeking
–  Web search
•  Accidental information discovery
66
Information Acquisition
NASA’s Cur...
•  In many cases, the phenomenon of accidental information
discovery is facilitated by users prior actions – serendipity
•...
•  Everyday millions of tweets shared
•  Most of these tweets are highly personal
and contextual
•  Only around 12% posts ...
•  Informativeness of a tweet
depends upon reader’s
–  Intent
–  Knowledge about the information in the
tweet or novelty i...
Naïve Bayes classifier
Rule-based
Filtering
Supervised
Classification
Tweets
Informative
Tweets
Experiments: Informativene...
Naïve Bayes classifier
Rule-based
Filtering
Supervised
Classification
Tweets
Informative
Tweets
Experiments: Informativene...
Naïve Bayes classifier
Rule-based
Filtering
Supervised
Classification
Tweets
Informative
Tweets
Experiments: Informativene...
73
•  Randomly selected 40k tweets related to diabetes
•  Gold standard dataset
–  Randomly selected 3000 tweets
–  Annota...
74
Evaluation: Supervised Classification (NB)
Features Precision
Tweet 66.20
Tweet + URL Title 68.72
Tweet + URL Title + UR...
75
Hadoop-MapReduce
Framework
Informativeness
Analysis
Semantic
Categorization
Soni, S. 2015. Domain specific document ret...
76
Search and Explore
X Controls Cancer
X = diet, treatment, exercise
(Pattern-based Approach
leveraging domain
semantics)...
Other work
77
78
Desktop
Mobile
Mobile
usage
took
Over
Comparative Analysis of Expressions of Search
Intents From Personal Computers and...
79
Twitris: Social Media Analytics Platform
•  Core component of around $6+ million research funding
(NFS, NIH, AFRL)
•  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million)
–  Modeling Social Behavior for Healthcare Util...
•  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million)
–  Modeling Social Behavior for Healthcare Util...
82	
  
Conclusion
Search Intent Mining Health Search Intent Mining
83	
  
Conclusion
Health Search Intent Mining
Identified consumer-
oriented intent classes
Multi-label Classification
Prob...
Semantics-based
Intent Classification
-  Based on UMLS
semantic types and
concepts
-  Advanced text analytics
-  Consumer ...
85	
  
Conclusion
Information overload
on Twitter
Subjectivity
Adapted search intent mining algorithm to
enable efficient ...
Publications
•  Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer
Health Information Port...
Publications
87	
  
•  Twitris- a System for Collective Social Intelligence A Sheth, A Jadhav et al., Springer,
Encycloped...
•  Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for
web-query classific...
•  Sheth A, Avant D, Bertram C, inventors; Taalee, Inc., assignee. System and method for creating a
semantic web and its a...
90
Acknowledgement
91	
  
Now…
Then…
Now…
92	
  
Acknowledgement
 
Thank you J
Disclaimer: All other trademarks, logos and images used in this
presentation belong to their respective own...
Upcoming SlideShare
Loading in …5
×

Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining

1,976 views

Published on

Understanding users’ latent intents behind search queries is essential for satisfying a user’s search needs. Search intent mining can help search engines to enhance its ranking of search results, enabling new search features like instant answers, personalization, search result diversification, and the recommendation of more relevant ads. Consequently, there has been increasing attention on studying how to effectively mine search intents by analyzing search engine query logs. While state-of-the-art techniques can identify the domain of the queries (e.g. sports, movies, health), identifying domain-specific intent is still an open problem. Among all the topics available on the Internet, health is one of the most important in terms of impact on the user and it is one of the most frequently searched areas. This dissertation presents a knowledge-driven approach for domain-specific search intent mining with a focus on health-related search queries.

First, we identified 14 consumer-oriented health search intent classes based on inputs from focus group studies and based on analyses of popular health websites, literature surveys, and an empirical study of search queries. We defined the problem of classifying millions of health search queries into zero or more intent classes as a multi-label classification problem. Popular machine learning approaches for multi-label classification tasks (namely, problem transformation and algorithm adaptation methods) were not feasible due to the limitation of label data creations and health domain constraints. Another challenge in solving the search intent identification problem was mapping terms used by laymen to medical terms. To address these challenges, we developed a semantics-driven, rule-based search intent mining approach leveraging rich background knowledge encoded in Unified Medical Language System (UMLS) and a crowd sourced encyclopedia (Wikipedia). The approach can identify search intent in a disease-agnostic manner and has been evaluated on three major diseases.

While users often turn to search engines to learn about health conditions, a surprising amount of health information is also shared and consumed via social media, such as public social platforms like Twitter. Although Twitter is an excellent information source, the identification of informative tweets from the deluge of tweets is the major challenge. We used a hybrid approach consisting of supervised machine learning, rule-based classifiers, and biomedical domain knowledge to facilitate the retrieval of relevant and reliable health information shared on Twitter in real time. Furthermore, we extended our search intent mining algorithm to classify health-related tweets into health categories. Finally, we performed a large-scale study to compare health search intents and features that contribute in the expression of search intent from 100+ million search queries from smarts devices (smartphones/tablets) and personal computers (desktops/laptops)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining

  1. 1. Committee: Amit Sheth (Advisor) T. K. Prasad Michael Raymer Jyotishman Pathak (Cornell University) Ph.D. Dissertation Defense Knowledge Driven Search Intent Mining Ashutosh Jadhav April 18, 2016
  2. 2. 2   3.5 billion Web searches per day
  3. 3. 3   3.5 billion Web searches per day One of the key aspects in building an intelligent search engine is to understand users’ search intents
  4. 4. 4   Search Intent Mining ! Applications
  5. 5. Search Result Ranking 5   Search Result Diversification Search Personalization Search Ads
  6. 6. Web Search Intent 6   Search intent is a significant object/topic that represents abstraction of users’ information needs. Search Goals* Search Topics Why WhatWhat
  7. 7. 7   Search Intent Mining Search Goals Search Topics Session dataClick-through Query log Manual Unsupervised Supervised Ontology-based Knowledge driven My Work Related Work Broder 2002 Beeferma 2003 Rose and Levinson 2004 Baeza-Yates 2006 Hu et al. 2009 Sadikov 2010 Nanda 2014 Ustinovskiy 2013 White 2010 Joachims 2002 Lee 2005 Fujita 2010 Hu 2012 Broder 2007 Radlinski 2010 Celikyilmaz 2011 Shen 2006 Biomedical KB – UMLS Crowd-sourced KB – Wikipedia Dictionaries – Hunspell, OpenMedspell Techniques
  8. 8. 8   Search is shifting toward understanding intent and serving objects -Li et al., ACL, 2010
  9. 9. 9  
  10. 10. 10   Health MovieSports TechnologyPhysics Health Diseases Symptoms Causes Medications Treatments Prevention
  11. 11.     Web Search for Health Information Among all topics available on the Internet, health is one of the most important in terms of impact on the user 11  
  12. 12. •  Major Challenges Ø  Consumers’ lack of medical knowledge to formulate health search queries Ø  Search engines’ failure to understand users’ health search intents 12   Challenges in Health Information Search •  Health information search is a “trial-and error” process.  
  13. 13. •  Health search intent mining applications: –  Personalized health information interventions –  To get better understanding of consumers’ health information needs –  Targeted advertisements Motivation: Real-world Applications 13   Research Problem: Domain specific search intent mining
  14. 14. 14   Thesis Statement Rich background knowledge from biomedical knowledge bases and Wikipedia enables development of effective methods for: I.  Intent mining from health-related search queries in disease agnostic manner II.  Efficient browsing of informative health information shared on social media.  
  15. 15. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) Health Search Intent 15  
  16. 16. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) Health Search Intent 16   Three focus groups Study questions: •  Motivation for using internet for health information seeking •  What do they search? (search intent) •  How do they search? •  What are the challenges in the search  
  17. 17. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 17   Selection criteria: •  Google PageRank, Alexa ranking, •  Medical Library Association’s ranking (CAPHIS - Consumer and Patient Health Information Section) Selected websites: Mayo Clinic, WebMD, MedlinePlus, CDC, HealthFinder.gov, and Familydoctor.org.
  18. 18. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 18  
  19. 19. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 19   Intent Classes Intent Classes 1 Symptoms 8 Living with 2 Causes 9 Prevention 3 Risks & Complications 10 Side effects 4 Drugs and Medications 11 Medical devices 5 Treatments 12 Diseases and conditions 6 Tests and Diagnosis 13 Age-group References 7 Food and Diet 14 Vital signs
  20. 20. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 20   Intent Classes Intent Classes 1 Symptoms 8 Living with 2 Causes 9 Prevention 3 Risks & Complications 10 Side effects 4 Drugs and Medications 11 Medical devices 5 Treatments 12 Diseases and conditions 6 Tests and Diagnosis 13 Age-group References 7 Food and Diet 14 Vital signs
  21. 21. •  Allows the instances to be associated with more than one class •  Problem transformation methods (fit data to algorithm) –  Transform the multi-label classification problem either into one or more single-label classification problems. –  e.g., Binary Relevance, Label Power, and RAKEL-RAndom k-LabELsets •  Algorithm adaptation methods (fit algorithm to data) –  Extend specific learning algorithms in order to handle multi-label data directly. –  e.g., Tree-based boosting - AdaBoost.MR, ML-kNN, and Rank-SVM 21   Multi-label Classification Both these methods follow underlying principles of the supervised learning approach and depend on training data.  
  22. 22. •  Manual, time consuming and labor intensive process •  May require domain experts •  Limited coverage –  Training data should be a representative sample of the dataset –  Very difficult to create a training dataset that can cover all aspects (discriminative features) of the dataset •  Generalization problem –  Poor performance on unseen data Challenges with Training Data Creation 22   These challenges get amplified for multi-label classification problems  
  23. 23. In the context of health search intent mining problem •  Training data for 14 intent classes •  Need domain experts to label dataset Supervised Classification Limitations 23   Domain constraint: A classifier trained for one disease may not work for other diseases These challenges make supervised learning- based approaches infeasible for our problem  
  24. 24. 24   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases
  25. 25. 25   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases Ontology Timeframe: early 2000 First patent on Semantic Web More information at blog
  26. 26. 26   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases Ontology
  27. 27. 27   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases
  28. 28. Unified Medical Language System   •  UMLS (Unified Medical Language System) –  Collection of over 100 controlled vocabularies such as MeSH, SNOMED_CT, NCI, and RxNorm Biomedical Knowledge Base 28   Metathesaurus Collection of concepts Semantic Network Semantic Types and Semantic Relationships SPECIALIST Lexicon Biomedical terms and their variants
  29. 29. •  Concept identification consists of two primary tasks: –  Concept recognition and concept mapping –  Example : what are the medications for stomach pain? Concepts: medication, stomach pain Challenges •  Lexical or orthographic variants e.g., (diet, dieting), (ICD9, ICD-9) •  Misspelling, e.g., (pneumonia, neumonia) •  Synonyms, e.g., (heart attack, myocardial infarction) •  Abbreviations, e.g., (myocardial infarction, MI) •  Identifying concept boundary e.g., (pain in stomach, stomach pain) •  Contextual meanings, e.g., (discharge from hospital, discharge from wound) Concept Identification 29  
  30. 30. •  Medical concept identification tools –  UMLS MetaMap, cTAKES, MedLEE, NCBO Annotator •  UMLS MetaMap –  Identifies ULMS Metathesaurus concepts from text –  Semantic Type (e.g., disease or syndrome) –  UMLS Concept (e.g., blood pressure and heart rate) •  Example (UMLS Concept) [Sematic Type] –  Phrase query: red wine heart attack •  Red wine (Red wine) [Food] •  Heart Attack (Myocardial Infarction) [Disease or Syndrome] 30   Concept Identification
  31. 31. •  Phrase query: water on the brain –  Water (Drinking Water) [Substance] –  Brain (Brain) [Body Part, Organ, or Organ Component] •  Actual Mapping should be –  Water on the brain (Hydrocephalus) [Disease or Syndrome] Concept Identification Challenges 31  
  32. 32. Concept Identification Approach 32   •  Advanced text analytics –  Word Sense Disambiguation (WSD) •  Process of identifying the meaning of a term in context •  With the WSD advancement, concepts are identified by considering the surrounding text –  Maximal phase detection •  Process each input record as a single phrase in order to identify more complex Metathesaurus terms •  Consumer Health Vocabulary (CHV)
  33. 33. •  Consumer Health Vocabulary (CHV) –  Maps terms used by layman to medical terms –  E.g. hair loss => Alopecia •  Problem: CHV in UMLS is incomplete •  Example: water on the knee Water thick-knee (Burhinus vermiculatus) [Bird] •  Actual Mapping should be –  Water on the knee(Knee effusion ) [Disease or Syndrome] Consumer Health Vocabulary 33  
  34. 34. •  Consumer Health Vocabulary (CHV) –  Maps terms used by layman to medical terms –  E.g. hair loss => Alopecia •  Problem: CHV in UMLS is incomplete •  Example: water on the knee Water thick-knee (Burhinus vermiculatus) [Bird] •  Actual Mapping should be –  Water on the knee(Knee effusion ) [Disease or Syndrome] Consumer Health Vocabulary 34   Major challenge for health search intent mining problem
  35. 35. •  Traditional approach –  Identification of consumer-oriented terms from Medline search log, PatientsLikeMe forum data –  Manual review by healthcare professionals Approach: leverage knowledge from Wikipedia •  One of the most-used online health resources •  Continuously updated with emerging health terms •  Links consumer-oriented terms with health professionals terms using semantic relationships Consumer Health Vocabulary Generation 35  
  36. 36. •  Traditional approach –  Identification of consumer-oriented terms from Medline search log, PatientsLikeMe forum data –  Manual review by healthcare professionals Approach: leverage knowledge from Wikipedia •  One of the most-used online health resources •  Continuously updated with emerging health terms •  Links consumer-oriented terms with health professionals terms using semantic relationships Consumer Health Vocabulary Generation 36  
  37. 37. •  Wikipedia: Crowd sourced encyclopedia Consumer Health Vocabulary Generation 37  
  38. 38. •  Wikipedia: Crowd sourced encyclopedia Consumer Health Vocabulary Generation 38  
  39. 39. •  Wikipedia: Crowd sourced encyclopedia Consumer Health Vocabulary Generation 39   Health-related Wikipedia articles Health Category Candidate subcategories Articles tagged with candidate subcategories Step 1: Identification of health-related Wikipedia articles
  40. 40. Snippet 2: Knee effusion or swelling of the knee (colloquially known as water on the knee) occurs when excess synovial fluid accumulates in or around the knee joint. Snippet 1: Hair loss, also known as alopecia or baldness, refers to a loss of hair from the head or body. 40   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs
  41. 41. 41   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs Snippet 2: Knee effusion or swelling of the knee (colloquially known as water on the knee) occurs when excess synovial fluid accumulates in or around the knee joint. Snippet 1: Hair loss, also known as alopecia or baldness, refers to a loss of hair from the head or body.
  42. 42. 42   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs Pairs Terms Semantic Relationship Terms 1 hair loss also known as alopecia 2 hair loss also known as baldness 3 knee effusion colloquially known as water on the knee 4 swelling of the knee colloquially known as water on the knee 5 knee effusion same as swelling of the knee
  43. 43. 43   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs Wikipedia Patterns also called commonly called colloquially known as also known as commonly known as sometimes called also referred to as commonly termed sometimes known as also termed previously known as sometimes termed commonly referred to as colloquially referred to as sometimes referred to as Pattern-based information extractor
  44. 44. 44   Consumer Health Vocabulary Generation Step 3: Identification of CHV and medical terms from the candidate pairs Map terms from the candidate pairs to UMLS Metathesaurus using MetaMap •  Scenario 1: -  Both terms are present in the UMLS Metathesaurus -  e.g., {hair loss, alopecia} •  Scenario 2: -  Both terms are not present in the UMLS Metathesaurus -  e.g., {hospital trust, acute trust} •  Scenario 3: -  Only one term is present in the UMLS Metathesaurus -  e.g., {knee effusion, water on the knee}
  45. 45. •  Data: –  Cardiovascular disease (CVD) related search queries –  Limited to the United States •  Data timeframe: –  September 2011 to August 2013 •  Data collection tool: –  IBM NetInsight On Demand (Web Analytics tool) •  Dataset size: –  10.4 million CVD related search queries –  Significantly large dataset for a single class of diseases. 45   Dataset
  46. 46. •  Preprocessing –  Stop word removal –  Misspelling correction (using Hunspell spell checker) •  Dictionaries: Hunspell dictionary, and its medical version, OpenMedSpell –  Replace all CHV terms from the search queries with medical terms •  UMLS MetaMap –  Usage challenge: Significantly slow for millions of search queries Data Processing 46  
  47. 47. •  Preprocessing –  Stop word removal –  Misspelling correction (using Hunspell spell checker) •  Dictionaries: Hunspell dictionary, and its medical version, OpenMedSpell –  Replace all CHV terms from the search queries with medical terms •  UMLS MetaMap –  Usage challenge: Significantly slow for millions of search queries Data Processing 47   Solution: Developed a scalable MetaMap implementation using a Hadoop-MapReduce framework
  48. 48. •  Gold standard dataset –  Two domain experts annotated randomly selected search queries by labeling one search query with zero or more intent classes –  Gold standard dataset is further divided into training and testing •  Evaluation Matrics –  Macro Average Precision Recall –  Average of the precision and recall of the classification algorithm on different classes –  To identify classification performance at class-level 48   Evaluation
  49. 49. •  Search Query Annotation –  UMLS concepts and semantic types •  Classification Rules 49   Classification: Annotation and Rules Intent Class Classification Rule Examples Drugs and Medications •  {ST ∪ SC ∪ KW} SC* •  ST: ORCH|PHSU, CLND, PHSU •  SC: medication, medicine, drugs, dose, dosage, tablet, pill •  KW: meds •  (Without) SC*: alcohol, caffeine, fruit, prevent •  medications for pulmonary hypertension •  ibuprofen heart rate •  dextromethorphan blood pressure Abbreviations: ORCH - Organic Chemical PHSU - Pharmacologic Substance CLND - Clinical Drug
  50. 50. 50   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified Wrongly classified •  ibuprofen heart rate •  dextromethorphan blood pressure •  medications for pulmonary hypertension •  alcohol heart disease •  meds for acid reflux
  51. 51. 51   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified Wrongly classified •  ibuprofen heart rate •  dextromethorphan blood pressure •  medications for pulmonary hypertension •  alcohol heart disease •  meds for acid reflux
  52. 52. 52   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified Wrongly classified •  ibuprofen heart rate •  dextromethorphan blood pressure •  medications for pulmonary hypertension •  meds for acid reflux •  alcohol heart disease
  53. 53. 53   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified •  ibuprofen heart rate •  meds for acid reflux •  alcohol heart disease •  medications for pulmonary hypertension •  dextromethorphan blood pressure
  54. 54. 54   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary •  Phrase query: water on the brain –  Water (Drinking Water) [Substance] –  Brain (Brain) [Body Part, Organ, or Organ Component] •  Actual Mapping should be –  Water on the brain (Hydrocephalus) [Disease or Syndrome] •  Advanced Text Analytics –  Word sense disambiguation, maximal phrase detection, CHV from UMLS
  55. 55. 55   Classification : Evaluation Results Rules Precision Recall F1 Score ST 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary •  Generating CHV from Wikipedia •  Example: water on the knee Water thick-knee (Burhinus vermiculatus) [Bird] •  Actual Mapping should be –  Water on the knee(Knee effusion ) [Disease or Syndrome]
  56. 56. •  Macro Average –  Precision:0.8842, Recall: 0.8607 and F-Score: 0.8723 56   Classification : Evaluation Results To check the performance of the classification approach for individual intent classes
  57. 57. No Intent Classes Total Queries Percentage Distribution 1 Diseases 4,232,398 40.66 2 Vital signs 3,455,809 33.20 3 Symptoms 1,422,826 13.67 4 Living with 1,178,756 11.32 5 Treatments 955,701 9.18 6 Food and Diet 779,949 7.49 7 Med Devices 665,484 6.39 8 Drugs and Medications 603,905 5.80 9 Causes 599,895 5.76 10 Tests & Diagnosis 344,747 3.31 11 Risks and Complication 277,294 2.66 12 Prevention 136,428 1.31 13 Age-group References 87,929 0.84 14 Side effects 25,655 0.25 Total 14,766,776 141.87 57   Classification: Results
  58. 58. 8%   48%   40%   4%   0%   Distribution of search queries by number of intent classes in which they are classified 0 1 2 3 4 and more 58   Classification: Results
  59. 59. Dataset Precision Recall F1-Score Cardiovascular Diseases 0.8842 0.8642 0.8723 Diabetes 0.9274 0.8964 0.9116 Cancer 0.8294 0.7635 0.7950 59   Classification: Results
  60. 60. Personalized eHealth Interventions 60 Application
  61. 61. 61 •  Hello, For the past 10 hours I've been expierencing a semi sharp pain in my upper right chest just below my armpit. This pain appears anywhere from every two and a half minutes to ten or fifteen minutes. I also have some stomach ache and dry mouth. I monitor my blood pressure is averages 130/90 with a average heart rate of 80. My cardiologist has been treating me since 1 year for high colesterol, gout and hypertension with great success. Also I have diabetes and I am taking Metformin and mevacor. I have an appointment with my cardiologist after 2 weeks. However I am wondering should I go to ER? BTW I am 69 years old male. Scenario in Clinical Decision Support System Source: DailyStrength forum
  62. 62. 62 Demographic Information dry mouth => Xerostomia Drugs and Medication Misspellings Diseases and Conditions Symptom Consumer Health Vocabulary expierencing => experiencing colesterol => cholesterol chest pain stomach ache Xerostomia (dry mouth) Age: 69 Gender: Male Metformin Mevacor Gout Hypertension Diabetes Blood pressure: 130/90 Heart rate: 80Vital Signs
  63. 63. •  Primary Symptom –  Chest pain •  upper side •  Right side •  Other symptoms –  Stomach ache –  Dry mouth •  Current diseases –  Hypertension –  Gout –  Diabetes •  Vital Signs –  Blood pressure = normal –  Heart rate = normal 63 1.  Diges2on-­‐Related  Causes   2.  Cardiovascular  Problems   3.  Viral  Infec2ons   4.  Gallbladder  Infec2on   5.  Pancreas  Inflamma2on   6.  Liver  Inflamma2on   7.  Pleurisy   8.  Lung  Diseases   Symptoms for CVD
  64. 64. •  Primary Symptom –  Chest pain •  upper side •  Right side •  Other symptoms –  Stomach ache –  Dry mouth •  Current diseases –  Hypertension –  Gout –  Diabetes •  Vital Signs –  Blood pressure = normal –  Heart rate = normal 64 1.  Diges2on-­‐Related  Causes   2.  Cardiovascular  Problems   3.  Viral  Infec2ons   4.  Gallbladder  Infec2on   5.  Pancreas  Inflamma2on   6.  Liver  Inflamma2on   7.  Pleurisy   8.  Lung  Diseases   Symptoms for CVD
  65. 65. 65   Thesis Statement Rich background knowledge from biomedical knowledge bases and Wikipedia enables development of effective methods for: I.  Intent mining from health-related search queries in a disease agnostic manner II.  Efficient browsing of informative health information shared on social media.  
  66. 66. •  Intentional information seeking –  Web search •  Accidental information discovery 66 Information Acquisition NASA’s Curiosity Rover on Mars Accidentally bumping into (useful or personal interest related) information
  67. 67. •  In many cases, the phenomenon of accidental information discovery is facilitated by users prior actions – serendipity •  Currently Twitter has thousands of health-centric accounts, which are followed by millions of users to keep up with health information 67 Health Information Acquisition
  68. 68. •  Everyday millions of tweets shared •  Most of these tweets are highly personal and contextual •  Only around 12% posts are informative •  User has to manually identify informative tweets 68 Research Problem: How to automate the identification of signals (informative tweets) from noise (Twitter stream) Information Overload on Twitter
  69. 69. •  Informativeness of a tweet depends upon reader’s –  Intent –  Knowledge about the information in the tweet or novelty in the information –  Interest in the subject –  Who is the author (expert in a domain, personal connection) 69 Informativeness of a Tweet is Subjective Objectively what makes a tweet informative?
  70. 70. Naïve Bayes classifier Rule-based Filtering Supervised Classification Tweets Informative Tweets Experiments: Informativeness Analysis
  71. 71. Naïve Bayes classifier Rule-based Filtering Supervised Classification Tweets Informative Tweets Experiments: Informativeness Analysis Rule-based Filters Dataset Experiment dataset Diabetes 40,000 Language English 29,034 URL Yes 17,422 Duplicate tweet 13,573 Minimum length Minimum number of words = 5 and characters = 80 10,927 Max spelling mistakes 2 10,176 URL filtering - Remove broken/not working URLs - Duplicate URLs 8,273 Min URL PageRank 5 6,374
  72. 72. Naïve Bayes classifier Rule-based Filtering Supervised Classification Tweets Informative Tweets Experiments: Informativeness Analysis Rule-based Filters Dataset Experiment dataset Diabetes 40,000 Language English 29,034 URL Yes 17,422 Duplicate tweet 13,573 Minimum length Minimum number of words = 5 and characters = 80 10,927 Max spelling mistakes 2 10,176 URL filtering - Remove broken/not working URLs - Duplicate URLs 8,273 Min URL PageRank 5 6,374 Supervised Classification Features Bag-of-words Unigrams, bigrams Text Features •  Message length •  Percentage of words, special characters •  Part of speech tags Author features •  Social connectivity (Number of follow-followers) •  Activity level (Number of tweets) •  Author credibility/influence (Klout score) Popularity features Number of tweets, retweets, Facebook share, like, comments, recommendations Google plus, LinkedIn shares Reliability feature URL PageRank
  73. 73. 73 •  Randomly selected 40k tweets related to diabetes •  Gold standard dataset –  Randomly selected 3000 tweets –  Annotation: 3 annotators independently rate the tweet with informative score (1-4) (low to high) –  Informative scores (1-4) then transformed into binary scores –  Label distribution: Informative: 33.6% non-informative: 66.4% Experiments: Gold Standard Dataset Approach Sample space Sample space Reduction Rule-based filtering 6,374 84.25%
  74. 74. 74 Evaluation: Supervised Classification (NB) Features Precision Tweet 66.20 Tweet + URL Title 68.72 Tweet + URL Title + URL Content 74.67 Tweet + URL Title + URL Content + Tweet Length 74.92 Tweet + URL Title + URL Content + Tweet Length + Number of words 75.79 (Tweet + URL Title + URL Content + Tweet Length + Number of words + Special chars) => FT1 76.83 FT1 + POS tags 77.23 FT1 + POS tags + PageRank 80.63 FT1 + POS tags + PageRank + social share 80.66 FT1 + POS tags + PageRank + social share + Author Features 80.93
  75. 75. 75 Hadoop-MapReduce Framework Informativeness Analysis Semantic Categorization Soni, S. 2015. Domain specific document retrieval framework on near real-time social health data. Thesis, Wright State University
  76. 76. 76 Search and Explore X Controls Cancer X = diet, treatment, exercise (Pattern-based Approach leveraging domain semantics) Top Health News Faceted search (based on intent classification algorithm) Learn about disease Source: Mayo Clinic Search & Explore Top Health News Tweet Traffic Learn about Disease Home Tweet Traffic
  77. 77. Other work 77
  78. 78. 78 Desktop Mobile Mobile usage took Over Comparative Analysis of Expressions of Search Intents From Personal Computers and Smart Devices
  79. 79. 79 Twitris: Social Media Analytics Platform •  Core component of around $6+ million research funding (NFS, NIH, AFRL)
  80. 80. •  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million) –  Modeling Social Behavior for Healthcare Utilization and Outcomes in Depression •  •  Air Force Research Lab (AFRL) –  Geo-Social mash-up for situational awareness in a disaster response situation •  Funded project: 2010-2011, Real-time Twitris •  –  Social media analysis for situational awareness (Funded: 2011-2012) •  –  WBI's Tec^Edge Innovation and Collaboration Center (Tec^Edge ICC) •  Funded project: Summer 2010, Summer 2011 •  Mayo Clinic Meritorious Award –  Healthcare trend surveillance using social networks and health search queries (funded 2013) –  What makes a health-related tweet informative (funded 2014) Research Grants and Proposals 80
  81. 81. •  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million) –  Modeling Social Behavior for Healthcare Utilization and Outcomes in Depression •  •  Air Force Research Lab (AFRL) –  Geo-Social mash-up for situational awareness in a disaster response situation •  Funded project: 2010-2011, Real-time Twitris •  –  Social media analysis for situational awareness (Funded: 2011-2012) •  –  WBI's Tec^Edge Innovation and Collaboration Center (Tec^Edge ICC) •  Funded project: Summer 2010, Summer 2011 •  Mayo Clinic Meritorious Award –  Healthcare trend surveillance using social networks and health search queries (funded 2013) –  What makes a health-related tweet informative (funded 2014) Research Grants and Proposals 81
  82. 82. 82   Conclusion Search Intent Mining Health Search Intent Mining
  83. 83. 83   Conclusion Health Search Intent Mining Identified consumer- oriented intent classes Multi-label Classification Problem (L=14) Supervised ML Knowledge-driven Approach
  84. 84. Semantics-based Intent Classification -  Based on UMLS semantic types and concepts -  Advanced text analytics -  Consumer Health Vocabulary Consumer Health Vocabulary Generation -  Leveraged Knowledge from Wikipedia -  Maps CHV terms to medical terms 84   Conclusion Knowledge Driven Approach for Health Search Intent Mining Concept Identification -  UMLS MetaMap -  Advanced text analytics -  Consumer Health Vocabulary Personalized eHealth Interventions
  85. 85. 85   Conclusion Information overload on Twitter Subjectivity Adapted search intent mining algorithm to enable efficient browsing of the health information on Social Health Signals   Objectively what makes a tweet informative?
  86. 86. Publications •  Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal A Jadhav et al. AMIA Annual Symposium 2014 •  Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal A Jadhav et al. Journal of Medical Internet Research JMIR (Impact factor 4.7) •  Evaluating the Process of Online Health Information Searching: A Qualitative Approach to Exploring Consumer Perspectives A Fiksdal, A Kumbamu, A Jadhav et al. Journal of Medical Internet Research JMIR (Impact factor 4.7) •  Online Information Seeking for Cardiovascular Diseases: A Case Study from Mayo Clinic A Jadhav et al. 25th European Medical Informatics Conference (MIE 2014) •  Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, Pitfalls, and Use Cases M Panahiazar, V Taslimi, A Jadhav et al. IEEE International Conference on Big Data (IEEE BigData 2014) •  Comparative Analysis of Online Health Information Search by Device Type A Jadhav et al. AMIA TBI/CRI 2014 •  An Analysis of Mayo Clinic Search Query Logs for Cardiovascular Diseases A Jadhav et al. AMIA Annual Symposium 2014 •  What Information about Cardiovascular Diseases do People Search Online? A Jadhav et al. 25th European Medical Informatics Conference (MIE 2014) 86  
  87. 87. Publications 87   •  Twitris- a System for Collective Social Intelligence A Sheth, A Jadhav et al., Springer, Encyclopedia of Social Network Analysis and Mining (ESNAM), 2014 •  Twitris: Socially Influenced Browsing A Jadhav et al. Semantic Web Challenge, International Semantic Web Conference ISWC 2009 •  Twitris 2.0: Semantically Empowered System for Understanding Perceptions From Social Data A Jadhav et al. Semantic Web Challenge, International Semantic Web Conference ISWC 2010 •  Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences M Nagarajan, K Gomadam, A Sheth, A Ranabahu, R Mutharaju A Jadhav Web Information Systems Engineering (WISE 2009) •  Understanding Events Through Analysis Of Social Media A Sheth, H Purohit, A Jadhav, et al., Technical Report, Kno.e.sis Center, 2010 •  Twitris+: Social Media Analytics Platform for Effective Coordination A. Smith, A. Sheth, A. Jadhav, et al. NSF SoCS Symposium, 2012 •  Patent on Context-Aware Information Recommendation, filed in January 2013 –  Patent filled based on HP summer 2011 internship work –  Ashutosh Jadhav, Hamid Motahari, Susan Spence, Claudio Bartolini
  88. 88. •  Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Transactions on Information Systems (TOIS) 24, 3,320-352. •  Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 131-138. •  Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. 2010. Clustering query refinements by user intent. In Proceedings of the 19th international conference on World wide web. ACM, 841-850. •  Radlinski, F., Szummer, M., and Craswell, N. 2010. Inferring query intent from reformulations and clicks. In Proceedings of the 19th international conference on World wide web. ACM, 1171-1172. •  Rose, D. E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of the 13th international conference on World Wide Web. ACM, 13-19. •  Nanda, A., Omanwar, R., and Deshpande, B. 2014. Implicitly learning a user interest profile for personalization of web search using collaborative filtering. In Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE •  Soni, S. 2015. Domain specic document retrieval framework on near real-time social health data. Thesis, Wright State University •  Naaman, M., Boase, J., and Lai, C.-H. 2010. Is it really about me?: message content in social awareness streams. In Proceedings of the 2010 ACM conference on Computer supported cooperative work. ACM, 189-192. •  White, R. W. and Horvitz, E. 2014. From health search to healthcare: explorations of intention and utilization via query logs and user surveys. JAMIA •  Celikyilmaz, A., Hakkani-T ur, D., and T ur, G. 2011. Leveraging web query logs to learn user intent via bayesian discrete latent variable model. In Proceedings of ICML. •  Amit Sheth 15 years of Semantic Search and Ontology-enabled Semantic Applications 88   References
  89. 89. •  Sheth A, Avant D, Bertram C, inventors; Taalee, Inc., assignee. System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising. United States patent US 6,311,194. 2001 Oct 30. •  Lu, C.-J. 2012. Accidental discovery of information on the user-defined social web: A mixed- method study. Ph.D. thesis, University of Pittsburgh. •  Li, X. 2010. Understanding the semantic structure of noun phrase queries. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1337-1345. •  Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., and Zeng- Treitler, Q. 2008. Consumer health concepts that do not map to the umls: where do they fit? Journal of the American Medical Informatics Association 15, 4, 496-505. •  Hu, J., Wang, G., Lochovsky, F., Sun, J.-t., and Chen, Z. 2009. Understanding user's query intent with wikipedia. In Proceedings of the 18th international conference on World wide web. ACM, •  Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., and Zheng, Q. 2012. Mining query subtopics from search log data. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 305-314 •  Fox, S. 2014. Pew internet & american life project report. 2013. Pew Internet: Health URL: http:// www. pewinternet. org/fact-sheets/health-fact-sheet/ •  Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Broder, A. 2002. A taxonomy of web search. In ACM Sigir forum. Vol. 36. ACM, 3-10. •  Baeza-Yates, R., Calderon-Benavides, L., and Gonzalez-Caro, C. 2006. The intention behind web queries. In String processing and information retrieval. Springer, 98-109. 89   References
  90. 90. 90 Acknowledgement
  91. 91. 91   Now… Then… Now…
  92. 92. 92   Acknowledgement
  93. 93.   Thank you J Disclaimer: All other trademarks, logos and images used in this presentation belong to their respective owners.

×