Calling Watson to Ward 8 Stat


Published on

This presentation will provide insight into Watson’s DeepQA process, the complexities and details of the DeepQA challenge, and how these tools and techniques can be applied in a clinical setting.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Background on technology and Watson™/Jeopardy and the data Tsunami we face in h/cHow DeepQA™ WorksDeepQA™ applied to HealthcareCurrent Example of Medical Intelligence (CTRM)Future Use Cases
  • 15 years to get clinical studies into practice - The average rate of increase in use of 9 clinical procedures based on landmark studies and found that the average rate of increase in use was 3.2% per year, thus 15.6 years were required on average for 50% implementation. - Balas and Boren do not estimate how long it takes to conduct the research! They effectively start from when that research is submitted for publication.Cardiologists hide medical errors. A recent article surveying the professionalism of doctors by specialty found that almost 2/3rds of cardiologists admitted that they had recently refused to report a serious medical error that they had direct personal knowledge of to any authority (Campbell, et al., 2007).
  • 9 landmark studies and the rate of use in the most current published study which is indicated by the reference number immediately following the percent rate of useThese figures are almost certainly an underestimate of the time it takes to translate research to impacts and anoverestimate of the percent of studies that survive to contribute to utilization
  • Combines large amounts of unstructured data with structured data to be analyzed together Understands ambiguous and imprecise questions using sophisticated natural language algorithms Identifies many answers to questions with evidence to "explain" rationale for answers Enables iterative and interactive question and answering to refine and improve results Learns from additional evidence, additional questions and mistakes to improve accuracy over time
  • Massively Parallel Probabilistic Evidence-Based Architecture Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval, Machine Learning and Reasoning Algorithms. These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.<click> Watson – the computer system we developed to play Jeopardy! is based on the DeepQAsoftatearchtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. <UIMA Mention>For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
  • Notes: This is easy if you know that Charles Dickens wrote Victorian literature. This is not part of medical inference, though, so we do not cover that, and an incorrect answer is preferred because its passage matched the query better. Without knowing about Victorian literature, there is not enough other information in the question to reliably find the correct answer.
  • Post Op Discharge:Patient hospital discharge instructions and treatment planSymptoms: set expectations, detect risksAugments nurse follow-up and tracks recovery until follow-up appointmentMulti-channel options: phone, IM, web, mobile SMS, app
  • doi: 10.4065/ 81.3.338Mayo ClinicProceedings March 2006 vol. 81 no. 3 338-344 ClinProc. 2006;81(3):338-344
  • Calling Watson to Ward 8 Stat

    1. 1. Calling Watson™ to Ward 8 Stat Nick van Terheyden, MD Chief Medical Information Officer – Clinical Language Understanding Nuance Communications Inc Wednesday, February 2 9:45 - 10:45 AMDISCLAIMER: The views and opinions expressed in this presentation are those of the author and do not necessarily represent official policy or position of HIMSS. Watson™ and DeepQA™ are trade names of IBM
    2. 2. Conflict of Interest Disclosure Nick van Terheyden, MD• Salary: Nuance Communications Inc © 2012 HIMSS
    3. 3. Learning Objectives• Recognize how technology can bring real-time knowledge and the latest clinical developments to the clinicians‟ workflow.• Define IBM‟s Watson™ - an insight into the DeepQA™ process, the complexities and details of the DeepQA™ challenge, and how these tools and techniques can be applied in a clinical context.• Summarize the progress to date on the development, and implementation behind the scenes on Watson in healthcare.• Demonstrate the data tsunami challenge faced in the clinical settings and how artificial intelligence technology like Watson™ can offer new means for rapid access to critical, specific and highly relevant data with corresponding links to underlying evidence.• Identify an interim pathway for attendees to develop their own concrete steps to create an information rich yet physician friendly environment Watson™ and DeepQA™ are trade names of IBM
    4. 4. Medicine used to be simple, ineffective and relatively safe. Now it is complex, effective and potentially dangerousSir Cyril Chantler, Kings Fund Chantler C. The role and education of doctors in the delivery of health care. Lancet 1999;353:1178-81u
    5. 5. Lifestyle defines „Group Health‟ 60 % - 80% of Group Health issues may be preventable– 58% Reduction in Diabetes – 60% Fewer Cardiac with lifestyle modification Events Hambrecht Circulation 2004;109:1371-78 Tuomilehto, 2001 NEJM 344(18): 1343-50– 60% Less Cancer – 44% Reduction in total De Lorgeril, Arch Int Med 1998;158:1181-87 mortality (NNT=16) Lyon Heart Study, Circulation 1999;99:779-85– 83% less Heart Disease – 45% Reduction in total– 91% less Diabetes mortality (NNT=2.4) Nurses Health Study, NEJM 2000;343:16-22, NEJM 2001;345:790-97 Indian Heart Study, BMJ 1992;304:1015-19– 73% less CHD – 40% Mortality Reduction GISSI-Prevenzione, Med.Diet AHA11/01: Marchioli– 69% less Cancer HALE Project. Knoops JAMA 2004;292:1433-1439 – 67% Mortality Reduction Indo-Med Study, Lancet 2002;360:1455-61] 5 2009 Continua Health Alliance Brigitte Piniewski, MD
    6. 6. Modifiable Health 0 Age 25 65 Wellness 60-80% Lifestyle Pre-Illness Unpredictable Health Predictable (Rules-based) Health Illness Death 62008 2009 Continua Health Alliance Brigitte Piniewski, MD 6
    7. 7. To put it another way…. Age Wellness 0 25 65 Pre-Illness Fun No Fun Illness Death 72008 2009 Continua Health Alliance Brigitte Piniewski, MD 7
    8. 8. Preventive Medicine – A warning Age 0 25 65 Wellness $$$ $$$? 60-80% Lifestyle Pre-Illness Unpredictable Health Predictable (Rules-based) Health Illness Death 82008 2009 Continua Health Alliance Brigitte Piniewski, MD 8
    9. 9. Challenge – Clinical Knowledge-Processing Burden“Current medicalpractice reliesheavily on the Knowledge processing requirementunaided mind torecall a greatamount of detailedknowledge – aprocess which, to This gapthe detriment of all injures patientsstakeholders, hasrepeatedly been Knowledge processing capacityshown unreliable”Crane and RaymondThe Permanente JournalWinter 2003 Volume 7 No.1Kaiser Permanente Institute forHealth Policy Years ago Today Slide courtesy of Dr Mike Bainbridge
    10. 10. Information Overload – Big Data• Watson™ can sift through 200 million pages in 3 secs – Graphic/analogy• Medical information doubling every 5 years – Reference • Brent James, MD, MStat, Chief Quality Officer, Intermountain Health Care; subject of The New York Times article “If Health Care is Going to Change, Dr. Brent James Will Lead the Way” • t.html?pagewanted=all• 1.8 zetabytes of information created this year – majority of it unstructured – 57 Billion 32Gb iPods (Source: IDC) – That‟s enough information to fill 57 billion 32GB Apple iPads (which could build a mountain of iPads 25 times higher than Mt Fuji
    11. 11. Time To Market• Studies suggest that it takes an average of 17 years for research evidence to reach clinical practice (it took 25 years for Beta blockers Rx for heart patients) (1)• It takes an estimated average of 17 years for only 14% of new scientific discoveries to enter day-to-day clinical practice (2)• Roughly 5% of autopsies reveal lethal diagnostic errors for which a correct diagnosis coupled with treatment could have averted death1. Balas, E. A., & Boren, S. A. (2000). Yearbook of Medical Informatics: Managing Clinical Knowledge for Health Care Improvement. Stuttgart, Germany: Schattauer Verlagsgesellschaft mbH2. Westfall, J. M., Mold, J., & Fagnan, L. (2007). Practice-based research - "Blue Highways" on the NIH roadmap. JAMA, 297(4), p. 403.3. Shojania, KG, Burton EC, McDonald KM, Goldman L Changes in rates of autopsy-detected diagnostic errors over time: a systematic review. JAMA. 2003;289(21):2849-22856
    12. 12. Current Rate of Use for Selected Procedures Clinical Procedure Landmark Trial Current Rate of Use Flu Vaccination 1968 (7) 55% (8) Thrombolytic therapy 1971 (9) 20% (10) Pneumococcal vaccination 1977 (11) 35.6% (8) Diabetic eye exam 1981 (4) 38.4% (6) Beta blockers after MI 1982 (12) 61.9% (6) Mammography 1982 (13) 70.4% (6) Cholesterol screening 1984 (14) 65% (15) Fecal occult blood test 1986 (16) 17% (17) Diabetic foot care 1983 (18) 20% (19)1. Balas, E. A., & Boren, S. A. (2000). Yearbook of Medical Informatics: Managing Clinical Knowledge for Health Care Improvement. Stuttgart, Germany: Schattauer Verlagsgesellschaft mbH
    13. 13. Reading to Keep up – Information Overload• Todays experienced clinician needs close to 2 million pieces of information to practice medicine• Doctors subscribe to an average of seven journals representing over 2,500 new articles each year, making it literally impossible to keep up-to-date with the latest information about diagnosis, prognosis and therapy• Comparison of the time required for reading (for general medicine, enough to examine 19 articles per day, 365 days per year ) with the time available (well under an hour per week by British medical consultants, even on self-reports ).• Furthermore, the interpretation of patient data is difficult and complicated, mainly because the required expert knowledge in each of the many different medical fields is enormous and the information available for the individual patient is multi-disciplinary, imprecise and very often incomplete.
    14. 14. Meet Gerard Donovan….Cardiology Radiology Billing Plant Administration Pharmacy Food Lab About that Bill$3,943 $1,290 $1,433 services $3,233 Intensive Care $17,664 Operating Room $36,127 ... and his 150 medical staff...
    16. 16. Watson™ DeepQA™ Technology• Analyzing large volumes of structured and unstructured data• Interprets and understands natural language questions• Generates and evaluates hypothesis and quantifies confidence in answers• Supports iterative dialog to refine results• Adapts and learns over time improving results
    17. 17. DeepQA™: The Technology Behind Watson™ Learned Models help combine and weigh the Evidence Evidence Balance Sources & Combine Answer Models Models Sources DeepQuestion Answer Evidence Models Models Evidence Candidate Scoring Retrieval 100,000’s Scores from Primary 1000’s of Scoring many Deep Analysis Answer Models Models Search Pieces of Evidence Algorithms Generation 100’s Possible Answers Multiple 100’s Interpretations sourcesQuestion & Final Confidence Question Hypothesis Hypothesis and Evidence Topic Synthesis Merging & Decomposition Generation Scoring Analysis Ranking Hypothesis Hypothesis and Evidence Answer & Generation Scoring Confidence ...
    18. 18. ArchitectureUser ExperienceBy Nuance and Partners….. … of consumers – large and small CLU…… Cloud to Cloud DeepQA™ Solutions for ….community of HealthcareEMRs Content Publishers LargeInstitutional … of Providers CASE Content Partners
    19. 19. Comparison• Not simple search• Analysis of multiple concurrent complex contributing conditions and factors
    20. 20. Question and Answer Sets Success• Question: This hormone deficiency is associated with Kallmanns syndrome. – Passage: Isolated deficiency of GnRH or its receptor causes failure of normal pubertal development and amenorrhea in women. This disorder is termed Kallmann syndrome when it is accompanied by anosmia and has also been termed idiopathic hypogonadotropic hypogonadism (IHH).”• Answer: GnRH• Notes: We know that “GnRH” is a hormone (from the ontology) so that lets us choose it as the most likely answer.
    21. 21. Question and Answer Sets Miss• Question: Eponym from Victorian literature for obesity hypoventilation syndrome. – Correct passage: Obesity-hypoventilation syndrome is also known as pickwickian syndrome, in reference to Charles Dickens‟… – Correct answer: Pickiwickian Syndrome – Wrong passage: Other clinical features associated with obesity-hypoventilation syndrome are daytime hypersomnolence and cor pulmonale. – Wrong answer: cor pulmonale
    22. 22. Potential Use Cases• If We Only Knew What We Knew – Bringing Evidence to the Point of Care – Consumption of medical records, results etc offering differential diagnosis and probability analysis with links to underlying literature sources – Draws on the specifics of a patient case and vast volumes of clinical data and medical – Highly granular results tailored to a particular patient‟s conditions, demographics, history – True personalization of medicine based on large cohort historical data analysis• Acting on What We Know – Medication dosage: guidelines, clinical research findings for specific patient – Adverse drug reactions: computational model + research database populated by Watson – Treatment Options: contextualized to patient – Standard of Care: aligning treatment to standards – Trending guidelines: recently published, pre-official – Post-Operative Discharge and Follow up – Entry of symptoms or symptomatic trends can trigger alerts for follow up – Ongoing refinement based on dynamic interaction and learning – Medical avatar for treatment and management of chronic conditions
    23. 23. Long Term Objectives• Creation of a state of the art system oriented to evidence based decision making in healthcare, where such a system – Reports the suggested decisions and decision processes – Reports the aggregated data from clinical processes – Defined as real-time or retrospective system – Designed to assist medical professions involved in the patient life cycle, in diagnosis and treatment of a patient• Applying and expanding Watson‟s framework in conjunction with Clinical Language Understanding, medical data and medical ontology• Integrated into medical workflow and learn over time
    24. 24. Challenges• Ambiguous human language• Integration with existing systems – extract of complete data set for history, results etc – Often in disparate systems – Non standard interfaces – Non standard format – Unstructured narrative• Patient interaction with technology vs humans – Telemedicine and consumer trend towards home based care
    25. 25. Replacing the Doctor?• Study done by the Mayo Clinic in 2006 identified the most important characteristics patients feel a good doctor must possess• The Ideal clinician is – confident, – empathetic, – humane, – personal, – forthright, – respectful, and – thorough• These facets are entirely human and will be hard for technology to replace Mayo Clin Proc. 2006;81(3):338-344
    26. 26. QuestionsFor More information I can be reached atNick van Terheyden, MDChief Medical Information Officer,Nuance drnic1@gmail.comTwitter of the Doctor http://nvt.myplaxo.comFaceBook Voice (301) 355-0877
    27. 27. Calling Watson™ to Ward 8 Stat Nick van Terheyden, MD Chief Medical Information Officer – Clinical Language Understanding Nuance Communications Inc Wednesday, February 2 9:45 - 10:45 AM