SlideShare a Scribd company logo
mining unstructured healthcare data



                        deep dhillon

chief data scientist | ddhillon@alliancehealth.com | twitter.com/zang0
alliance health networks


                   HCP/advanced patients
         medify.com – 10,000% user growth past year

                           patients
diabeticconnect.com - # 1 online diabetes site @ ~1.4M uniques
    *connect.com – content sites + guided social networks

                     health care industry
         patient surveying, matchmaking and analysis
why mine healthcare text?
anatomy of what patients currently use: webmd (e.g. drugs.com, yahoo health, etc.)

q/a           topic pages            news + media                  experts             discussions

                                                                             •   original content
                                                                             •   addresses emotional needs
                                                                             •   simple to understand
                                                                             •   provides answers



                                                                 • original content
                                                                 • moderately authoritative
                                                                 • simple to understand
                                         •   original content    • good coverage in the head
                                         •   fresh               • provides answers
                                         •   simple to understand
                                         •   good coverage in the head
                •   original content
                •   typically aligned well with google searches,
                    i.e. treatments for X, symptoms for Y            original content=important
                •   good coverage in the head                        providing answers=important
•     original content                                               head=developed
•     aligns well with google searches                               fresh=important
•     provides answers                                               authority=important
why mine healthcare text?
Patients and HCPs need long tail, statistically meaningful, consumer friendly,
authoritative and fresh health content.

q/a           topic pages             news + media                experts                 discussions

                                                                               •   manually written, not authoritative
                                                                               •   not thorough, i.e. not long tail
                                                                               •   not consistently credible, i.e. minimal
                                                                               •   accredation
                                                                               •   not statistically meaningful
                                                                               •   sparse

                                                                  •   manually written, moderately authoritative
                                                                  •   not thorough, i.e. not long tail
                                         •   manually written, not authoritative
                                         •   not consistently credible, i.e. minimal accredation
                                         •   not thorough, i.e. not long tail

                •   evergreen, rarely change
                •   manually written, not authoritative
                •   not consistently credible, i.e. minimal accredation
                •   not thorough, i.e. not long tail                               manual=expensive
                                                                                   manual=head focused
 •    manually written, not authoritative
 •    not consistently credible, i.e. minimal accredation                          manual=not authoritative
 •    not statistically meaningful                                                 manual=old and dated
automated content generation

•   cost effective
•   structured content
•   statistically based
•   scales to millions of patients
•   scales to long tail treatments + conditions
•   authoritative / citation driven
•   fresh
types of text mining
medify demo
how do we mine text?
                 assignments


curators

                  examples




  knowledge        parser               Index




external repos                  Page
  like UMLS                             Search
                               Module
annotate in the product:
•   Easier for people to give explicit GT feedback
•   Channels high visibility error angst productively
•   Channels more GT toward areas most seen by users
•   Override high visibility system mistakes with human data




gt demo:
• http://www.diabeticconnect.com/discussions/4790
• https://www.medify.com/internal/annotate/abstract?abstractId=8181575
Detailed Signals Preliminary Mining From Discussion Threads


5,000


4,500


4,000


3,500


3,000


2,500


2,000


1,500


1,000


 500


   0
Distribution of Signals Mined From Diabetic Connect Discussion
                            Threads




         Medical History - Newly                                              Lifestyle Adjustment
               Diagnosed                         Medical History - Symptom              6%
                   0%                                        3%




                                                                                    Medical History - Condition
                                                                                               10%



                                    Helper/Influencer
                                          33%




                                                                             Personal Account
                                                                                   12%




                                                                 Treatment Experience
                                                                         15%

                                   Medical History
                                        21%
API


 Research (Outcome)          Discussion (Gromin)
                                                                    curator
   treatment/symptom/condition/demographic

                                                   •   identity (i.e. UMLS id)
                      UMLS
                                                   •   taxonomical relations
                                                         • hierarchical is_a
                                                         •   condition/arthritis/Rhematoid Arthritis
                                                   •   synonym relations
                                                         • RA=Rhematoid Arthritis
                                                         • metformim
                                                   •   polarity
knowledge base                                           • effective=positive
                                                         • ineffective=negative
                                                   •   parsing clues (ambiguity)
                                                   •   anaphor clues
abstract parser work flow
document    sentence selection         1
                                                        1.        sentences w/ high conclusion presence selected
                                                        2.        shallow entity tagging applied based on umls
                                       2                3.        sentence text is modified to retain entities, optimize
             shallow tagging                                      for performance, and eliminate unnecessary filler.
                                                        4.        deep typed dependency parsing of modified text
                                                        5.        3 tier text classification applied on BOW + entities
                                       3
                sentence                                6.        SVO triples extracted from dependencies

               modification                             7.        rule based conclusions generated from triples
                                                        8.        confidence models applied to rule and classifier
                                                                  based conclusions
9                                                       9.        knowledge base used to represent domain and
             dependency tree                                      discourse style.
knowledge         parse           4                     10.       errors measured against curated data applied for
                                                                  improvements


                                                              5

             triple extraction    6
                                           3 tier classification                                                10



                concluder         7

                                                              8

                        confidence assignment                                        annotations
discussion parser work flow
    message     sentence split          1             1.       sentences split from discussion thread messages
                                                      2.       shallow entity tagging applied based on umls
                                                      3.       sentence text is modified to retain entities, optimize
                                        2                      for performance, and eliminate unnecessary filler.
               shallow tagging                        4.       deep typed dependency parsing of modified text
                                                               for select conclusion types
                                                      5.       select conclusion type based text classification
                                        3                      applied on BOW + entities after dimensionality
                  sentence                                     reduction

                 modification                         6.       SVO triples extracted from dependencies
                                                      7.       rule based conclusions generated from triples

9                                                     8.       rule based KB driven anaphora resolution applied.
                                                               Classification based conclusions added.
               dependency tree                        9.       knowledge base used to represent domain and
knowledge           parse           4                          discourse style.
                                                      10.      errors measured against curated data applied for
                                                               improvements
                                                           5

               triple extraction    6
                                            classification                                                  10



                  concluder         7

                                                           8

              anaphora resolution                                                conclusions
feature engineering

• entities, i.e. normalize synonyms, id new
  entity types, like social relations
• entity types, i.e. metformin >
  treatment_medication
• phrase driven cues, i.e. [have] [you]
  [considered] > suggestion_indicator
anaphora resolution

• relation structure, i.e. [person]>takes>it
  it refers to treatment (i.e. not condition/symptom), and specifically: medication but not device

• statistically driven, manually curated cues
  i.e. drug > treatment/medication
• filter
   – non matching antecedent candidates
   – singular/plural agreement
• score candidates:
   – antecedent occurrence frequency
   – distance (#sents) from antecedent to anaphor
   – co-occurrence of anaphor w/ antecedent
technology

•   Lang: Java + Python + Ruby
•   DB: Solr 4, Mongo DB, S3
•   Work: Map Reduce
•   Dependencies: Malt Parser, Stanford Parser
• Misc: Tomcat, Spring, Mallet, Reverb, Minor Third
• Tagging: Peregrine + home grown
Data Pipeline




                21
Solr 1              Solr 2               Solr N




                                 Load Balancer – API




                Cache
                          Portal 1            Portal 2   Portal N




                              Load Balancer – www.medify.com




request transaction flow
                                                         browser
questions?
Advair: Experts vs. Patients
                  • “medicalese” vs. patients words
                  • more granularity
                  • a story like perspective w/ words
                    of inspiration



                    My Pulmonologist today said that he had just come from
                    the hospital bedside of a patient with my exact symptoms
                    who is on oxygen and not doing well. If I hadn't been so
                    diligent in following my asthma plan that could easily have
                    been me. His telling me that really hit home.




                    By the way, my Pulmonologist surprised me with his
                    view on many of his patients. He actually got excited
                    and thanked me for being knowledgeable about
                    asthma, understanding my own body and health needs
                    and for following my treatment plan. He said he doesn't
                    get many patients who can actually talk about their
                    disease, symptoms, time
                    frames, treatments/medications, and non-medical
                    measures they are taking. He also doesn't get many
                    people who ask questions. My response to this: What
                    are people thinking? They need to take control of their
                    own health or noone else will.

More Related Content

Similar to Mining Unstructured Healthcare Data

Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm   Dominique C Brack, SBCIWhats The Role Of Social Media In Bcm   Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
Reputelligence
 
Reactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining UsersReactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining Users
grebstock
 
Seek #CincySM Conversation Starter
Seek #CincySM Conversation StarterSeek #CincySM Conversation Starter
Seek #CincySM Conversation Starter
SEEK Company
 
Analyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchAnalyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ Research
NirmalPoudel4
 
Seek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starterSeek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starterSEEK Company
 
HxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in HealthcareHxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in HealthcareAmy Cueva
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation Starter
SEEK Company
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)Durham_Library_DTP
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)Jamie Bisset
 
Corporate communication workshop
Corporate communication workshopCorporate communication workshop
Corporate communication workshopOanaStefancu
 
4 12 college board conference
4 12 college board conference4 12 college board conference
4 12 college board conferenceJaneTors
 
Effects revised
Effects revisedEffects revised
Effects revisedmaduncan
 
SPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting IdeasSPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting IdeasVal Bello
 
Putting Social Media to Work
Putting Social Media to WorkPutting Social Media to Work
Putting Social Media to Work
Marcus Tewksbury
 

Similar to Mining Unstructured Healthcare Data (16)

Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm   Dominique C Brack, SBCIWhats The Role Of Social Media In Bcm   Dominique C Brack, SBCI
Whats The Role Of Social Media In Bcm Dominique C Brack, SBCI
 
Reactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining UsersReactive Writing Techniques for Rewarding and Retaining Users
Reactive Writing Techniques for Rewarding and Retaining Users
 
Seek #CincySM Conversation Starter
Seek #CincySM Conversation StarterSeek #CincySM Conversation Starter
Seek #CincySM Conversation Starter
 
Analyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ ResearchAnalyzing Qualitative Data for_ Research
Analyzing Qualitative Data for_ Research
 
Seek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starterSeek #DIGNC Digital Non-Conference Conversation starter
Seek #DIGNC Digital Non-Conference Conversation starter
 
HxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in HealthcareHxD 2012: Communities of Care: Social Media in Healthcare
HxD 2012: Communities of Care: Social Media in Healthcare
 
MRECO Conversation Starter
MRECO Conversation StarterMRECO Conversation Starter
MRECO Conversation Starter
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)
 
Critical evaluation (web version)
Critical evaluation (web version)Critical evaluation (web version)
Critical evaluation (web version)
 
Corporate communication workshop
Corporate communication workshopCorporate communication workshop
Corporate communication workshop
 
Speech
SpeechSpeech
Speech
 
4 12 college board conference
4 12 college board conference4 12 college board conference
4 12 college board conference
 
Press release
Press releasePress release
Press release
 
Effects revised
Effects revisedEffects revised
Effects revised
 
SPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting IdeasSPE 108 Research: Supporting Ideas
SPE 108 Research: Supporting Ideas
 
Putting Social Media to Work
Putting Social Media to WorkPutting Social Media to Work
Putting Social Media to Work
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

Mining Unstructured Healthcare Data

  • 1. mining unstructured healthcare data deep dhillon chief data scientist | ddhillon@alliancehealth.com | twitter.com/zang0
  • 2. alliance health networks HCP/advanced patients medify.com – 10,000% user growth past year patients diabeticconnect.com - # 1 online diabetes site @ ~1.4M uniques *connect.com – content sites + guided social networks health care industry patient surveying, matchmaking and analysis
  • 3. why mine healthcare text? anatomy of what patients currently use: webmd (e.g. drugs.com, yahoo health, etc.) q/a topic pages news + media experts discussions • original content • addresses emotional needs • simple to understand • provides answers • original content • moderately authoritative • simple to understand • original content • good coverage in the head • fresh • provides answers • simple to understand • good coverage in the head • original content • typically aligned well with google searches, i.e. treatments for X, symptoms for Y original content=important • good coverage in the head providing answers=important • original content head=developed • aligns well with google searches fresh=important • provides answers authority=important
  • 4. why mine healthcare text? Patients and HCPs need long tail, statistically meaningful, consumer friendly, authoritative and fresh health content. q/a topic pages news + media experts discussions • manually written, not authoritative • not thorough, i.e. not long tail • not consistently credible, i.e. minimal • accredation • not statistically meaningful • sparse • manually written, moderately authoritative • not thorough, i.e. not long tail • manually written, not authoritative • not consistently credible, i.e. minimal accredation • not thorough, i.e. not long tail • evergreen, rarely change • manually written, not authoritative • not consistently credible, i.e. minimal accredation • not thorough, i.e. not long tail manual=expensive manual=head focused • manually written, not authoritative • not consistently credible, i.e. minimal accredation manual=not authoritative • not statistically meaningful manual=old and dated
  • 5. automated content generation • cost effective • structured content • statistically based • scales to millions of patients • scales to long tail treatments + conditions • authoritative / citation driven • fresh
  • 6. types of text mining
  • 8. how do we mine text? assignments curators examples knowledge parser Index external repos Page like UMLS Search Module
  • 9. annotate in the product: • Easier for people to give explicit GT feedback • Channels high visibility error angst productively • Channels more GT toward areas most seen by users • Override high visibility system mistakes with human data gt demo: • http://www.diabeticconnect.com/discussions/4790 • https://www.medify.com/internal/annotate/abstract?abstractId=8181575
  • 10.
  • 11.
  • 12.
  • 13. Detailed Signals Preliminary Mining From Discussion Threads 5,000 4,500 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0
  • 14. Distribution of Signals Mined From Diabetic Connect Discussion Threads Medical History - Newly Lifestyle Adjustment Diagnosed Medical History - Symptom 6% 0% 3% Medical History - Condition 10% Helper/Influencer 33% Personal Account 12% Treatment Experience 15% Medical History 21%
  • 15. API Research (Outcome) Discussion (Gromin) curator treatment/symptom/condition/demographic • identity (i.e. UMLS id) UMLS • taxonomical relations • hierarchical is_a • condition/arthritis/Rhematoid Arthritis • synonym relations • RA=Rhematoid Arthritis • metformim • polarity knowledge base • effective=positive • ineffective=negative • parsing clues (ambiguity) • anaphor clues
  • 16. abstract parser work flow document sentence selection 1 1. sentences w/ high conclusion presence selected 2. shallow entity tagging applied based on umls 2 3. sentence text is modified to retain entities, optimize shallow tagging for performance, and eliminate unnecessary filler. 4. deep typed dependency parsing of modified text 5. 3 tier text classification applied on BOW + entities 3 sentence 6. SVO triples extracted from dependencies modification 7. rule based conclusions generated from triples 8. confidence models applied to rule and classifier based conclusions 9 9. knowledge base used to represent domain and dependency tree discourse style. knowledge parse 4 10. errors measured against curated data applied for improvements 5 triple extraction 6 3 tier classification 10 concluder 7 8 confidence assignment annotations
  • 17. discussion parser work flow message sentence split 1 1. sentences split from discussion thread messages 2. shallow entity tagging applied based on umls 3. sentence text is modified to retain entities, optimize 2 for performance, and eliminate unnecessary filler. shallow tagging 4. deep typed dependency parsing of modified text for select conclusion types 5. select conclusion type based text classification 3 applied on BOW + entities after dimensionality sentence reduction modification 6. SVO triples extracted from dependencies 7. rule based conclusions generated from triples 9 8. rule based KB driven anaphora resolution applied. Classification based conclusions added. dependency tree 9. knowledge base used to represent domain and knowledge parse 4 discourse style. 10. errors measured against curated data applied for improvements 5 triple extraction 6 classification 10 concluder 7 8 anaphora resolution conclusions
  • 18. feature engineering • entities, i.e. normalize synonyms, id new entity types, like social relations • entity types, i.e. metformin > treatment_medication • phrase driven cues, i.e. [have] [you] [considered] > suggestion_indicator
  • 19. anaphora resolution • relation structure, i.e. [person]>takes>it it refers to treatment (i.e. not condition/symptom), and specifically: medication but not device • statistically driven, manually curated cues i.e. drug > treatment/medication • filter – non matching antecedent candidates – singular/plural agreement • score candidates: – antecedent occurrence frequency – distance (#sents) from antecedent to anaphor – co-occurrence of anaphor w/ antecedent
  • 20. technology • Lang: Java + Python + Ruby • DB: Solr 4, Mongo DB, S3 • Work: Map Reduce • Dependencies: Malt Parser, Stanford Parser • Misc: Tomcat, Spring, Mallet, Reverb, Minor Third • Tagging: Peregrine + home grown
  • 22. Solr 1 Solr 2 Solr N Load Balancer – API Cache Portal 1 Portal 2 Portal N Load Balancer – www.medify.com request transaction flow browser
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Advair: Experts vs. Patients • “medicalese” vs. patients words • more granularity • a story like perspective w/ words of inspiration My Pulmonologist today said that he had just come from the hospital bedside of a patient with my exact symptoms who is on oxygen and not doing well. If I hadn't been so diligent in following my asthma plan that could easily have been me. His telling me that really hit home. By the way, my Pulmonologist surprised me with his view on many of his patients. He actually got excited and thanked me for being knowledgeable about asthma, understanding my own body and health needs and for following my treatment plan. He said he doesn't get many patients who can actually talk about their disease, symptoms, time frames, treatments/medications, and non-medical measures they are taking. He also doesn't get many people who ask questions. My response to this: What are people thinking? They need to take control of their own health or noone else will.