Upcoming SlideShare
×

# Inside the Mind of Watson: Cognitive Computing

1,011 views

Published on

Presentation given by Chris Welty (IBM Research) at Knoesis. We get the permission to upload this presentation from Chris Welty. Event details are at: http://j.mp/Welty-at-Knoesis and the associate video is at: https://www.youtube.com/watch?v=grDKpicM5y0

Published in: Technology
4 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

Views
Total views
1,011
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
0
0
Likes
4
Embeds 0
No embeds

No notes for slide
• Automatic Open-Domain Question Answering is represents a very long standing challenge in Computer Science, specifically in the areas of NLP, Information Retrieval and AI.&lt;go through slide&gt;
• Human performance is one of the things that makes the Jeopardy! Challenge so compelling. The best humans are very, very good at this task. In this chart, each dot corresponds to actual historical Jeopardy! games and represents the performance of the winner of those games. We refer to this cluster of dots as the “Winners Cloud”. For each dot, the X-axis, along the bottom of the graph, represents the percentage of questions in a game that the winning player got a chance to ANSWER. These were the questions he or she was confident enough and fast enough to ring-in or buzz in for FIRST. The Y-axis, going up along the left of the graph, represents the winning player’s PRECISION – that is, the percentage of those questions answered the player got RIGHT. CORemember, if a player gets a question wrong then they lose the \$ value of the clue and their competitors still get a chance to answer or rebound. But what we humans, tend to do really, really well is – confidently know what we know – computing an accurate confidence turns out to be key ability for winning at Jeopardy! Looking at the center of the green cloud, what you see is that, on average, WINNERS are confident enough and fast enough to answer nearly 50% of the questions in a game and do somewhere between 85% and 95% precision on those questions. That is, they get 85-95% of the ones they answer RIGHT. The red dots represents Ken Jennings&apos;s performance. Ken won 74 consecutive games against qualified players. He was confident and fast enough to acquire 60% and even up to 80% of a game’s questions from his competitors and still do 85% and 95% precision on average.Good Jeopardy! players are remarkable in their breadth, precision, confidence and speed.
• Competing with the top human players requires answering 100’s of unseen Jeopardy! question by identifying the more than 75% you think you know and getting over 85% precision on them – this is hard task for computer programs.Our Open-Domain Question Answering technology, based on state-of-the-art approaches, developed with a modest team for over 5 Years and with some very esteemed company, turned on this challenge, left a huge gap.None-the-less at least part of our job at IBM Research is to push the limits of what computers CAN and CAN NOT do. So, In 2007 we believed it was possible and committed to try to make a Huge Leap in a relatively short amount of time.
• We do NOT approach the Jeopardy Challenge by trying to anticipate all questions and building databases of answers. In fact, in a random sample of 20,000 Jeopardy Clues we automatically identified the main subject or type being asked about. We found that in 13% of the sampled questions, there was no clear indication at all for the type of answer and the players must rely almost entirely on the context to figure out what sort of answer is required.  The remaining 87% is what you see is this graph. It shows, what we call, a very long tail. There is no small-enough set of topics to focus on that covers enough ground. Even focusing on the most frequent few (The head of the tail to the left) will cover less-than 10% of the content. 1000’s of topics from hats to insects to writers to diseases to vegetables are all equally fair game. And FOR these 1000’s of types, 1000’s of different questions may be asked and then phrased in an huge variety of different of ways. So, our primary approach and research interest is not to collect and organize databases. Rather it is ON reusable Natural Language Processing (NLP) technology for automatically understanding naturally occurring human-language text. AS-IS, pre=existing structured knowledge in the form of DBs or KBs is used to help to bridge meaning and interpret multiple NL texts. But because of the broad domain and the expressive language used in the questions and in content, pre-built databases have very limited use of answering any significant number of questions. The focus rather is on NL understanding.
• Here we see the same question, the same parse, but on the other side we see that there exists a passage containing the RIGHT answer BUT with only one key word in common. &lt;read the green passage&gt; The system must consider in parallel and in detail a huge amount of content just to get a SHOT at this evidence and then must find and weigh the right inferences that will allow it to match and score with an accurate confidence, for example in this case  &lt;click&gt; Date Math, Statistical Paraphrasing and Geospatial reasoning. And its still not 100% certain What if, for example, the passage said “considered landing in” rather than “landed in” or what if there was just a preponderance of weaker evidence for another answer. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard. Meaning can be expressed in so many different ways and to achieve high levels of precision and confidence you must consider much more information and analyze it much more deeply. We is needed is a radically different approach that explores many different plaussive interpretations in parallel and collects and evaluates all sorts of evidence in support or in refutation of those possibilities.
• Here we see the same question, the same parse, but on the other side we see that there exists a passage containing the RIGHT answer BUT with only one key word in common. &lt;read the green passage&gt; The system must consider in parallel and in detail a huge amount of content just to get a SHOT at this evidence and then must find and weigh the right inferences that will allow it to match and score with an accurate confidence, for example in this case  &lt;click&gt; Date Math, Statistical Paraphrasing and Geospatial reasoning. And its still not 100% certain What if, for example, the passage said “considered landing in” rather than “landed in” or what if there was just a preponderance of weaker evidence for another answer. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard. Meaning can be expressed in so many different ways and to achieve high levels of precision and confidence you must consider much more information and analyze it much more deeply. We is needed is a radically different approach that explores many different plaussive interpretations in parallel and collects and evaluates all sorts of evidence in support or in refutation of those possibilities.
• Here we see the same question, the same parse, but on the other side we see that there exists a passage containing the RIGHT answer BUT with only one key word in common. &lt;read the green passage&gt; The system must consider in parallel and in detail a huge amount of content just to get a SHOT at this evidence and then must find and weigh the right inferences that will allow it to match and score with an accurate confidence, for example in this case  &lt;click&gt; Date Math, Statistical Paraphrasing and Geospatial reasoning. And its still not 100% certain What if, for example, the passage said “considered landing in” rather than “landed in” or what if there was just a preponderance of weaker evidence for another answer. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard. Meaning can be expressed in so many different ways and to achieve high levels of precision and confidence you must consider much more information and analyze it much more deeply. We is needed is a radically different approach that explores many different plaussive interpretations in parallel and collects and evaluates all sorts of evidence in support or in refutation of those possibilities.
• ### Inside the Mind of Watson: Cognitive Computing

1. 1. Inside the Mind of Watson:Cognitive ComputingChris WeltyIBM Researchibmwatson.comDo Not Record. Do Not Distribute. © 2011 IBM Corporation
2. 2. What is Watson?  Open Domain Question-Answering Machine  Given – Rich Natural Language Questions – Over a Broad Domain of Knowledge  Delivers – Precise Answers: Determine what is being asked & give precise response – Accurate Confidences: Determine likelihood answer is correct – Consumable Justifications: Explain why the answer is right – Fast Response Time: Precision & Confidence in <3 seconds – At the level of human experts – Proved its mettle in a televised match – Won a 2-game Jeopardy match against the all-time winners – viewed by over 50,000,0002 © 2011 IBM Corporation
3. 3. What is Jeopardy? Jeopardy! is an American quiz show – 1964 – Today – Household name in U.S. answer-and-question format – contestants are presented with clues in the form of answers – must phrase their responses in question form. – Open domain trivia questions, speed is a big factor Example – Category: General Science – Clue: When hit by electrons, a phosphor gives off electromagnetic energy in this form – Answer: What is light? © 2011 IBM Corporation
4. 4. What is Cognitive Computing? Increasingly, machines are being asked to add their computational power to problems which are not inherently solvable Traditionally, these problems came from AI – The hardest AI problems are the easiest for human intelligence: vision, speech, natural language – these are not actually associated with “being intelligent” – Human intelligence provides solutions, but does not scale Cognitive Computing is founded on four principles Learn & improve. Cognitive computing systems Assist & augment human cognition. Cognitive focus on inexact solutions to unsolvable problems computing addresses problems that lie squarely in that utilize machine learning and improve over time. the province of human intelligence, but where we Often they combine multiple approaches and must cant handle the volume of information, penetrate the integrate them effectively. They must learn from complexity or otherwise extend our reach humans, in more and more seamless ways. (physically). Interact in a natural way. Cognitive computing Speed&Scale. Cognitive computing harnesses the provides technologies that support a higher level of clear advantage machines have over humans in their human cognition by adapting to human approaches ability to perform mundane tasks of arbitrary and interfaces...over the next several decades it will complexity repeatedly, whether it is the scale of the incorporate essentially all the ways humans sense data or the complexity of the task. and interact. © 2011 IBM Corporation
5. 5. Examples of Cognitive ComputingWeb SearchImage SearchEvent SearchSocial ComputingNatural Language Processing © 2011 IBM Corporation
6. 6. The Jeopardy! ChallengeHard for humans, hard for machines \$200 \$1000 Broad/Open If you are looking at The first person Domain the wainscoating,for different reasons.by name in But hard mentioned you are looking in ‘The Man in the Iron this direction. Mask’ is this hero of a Complex previous book by the Language Who is same author. What is down? D’Artagnan? High For people, the challenge is knowing the answer Precision For machines, the challenge is understanding the question Accurate Confidence \$600 \$800 In cell division, mitosis The conspirators against splits the nucleus & this man were wounded by High cytokinesis splits this each other while they What is liquid cushioning the Who is Julius stabbed at him Speed nucleus cytoplasm? Caesar?6 © 2011 IBM Corporation
7. 7. The Winner’s CloudWhat It Takes to compete against Top Human Jeopardy! Players Each dot – actual historical human Jeopardy! gamesTop humanplayers areremarkably good. Winning Human Performance Grand Champion Human Performance 2007 QA Computer System More Confident Less Confident © 2011 IBM Corporation
8. 8. The Winner’s Cloud What It Takes to compete against Top Human Jeopardy! Players Each dot – actual historical human Jeopardy! games Winning Human In 2007, we committed to Performance making a Huge Leap! Grand Champion Human PerformanceComputers? 2007 QA Computer SystemNot So Good. More Confident Less Confident © 2011 IBM Corporation
9. 9. DeepQA: The Technology Behind Watson An example of a new software paradigm DeepQA generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence. Learned Models help combine and weigh the Evidence Evidence Sources Answer Models Models Sources DeepQuestion Answer Evidence Models Models Evidence Candidate Scoring Retrieval Primary Scoring Answer Models Models Search GenerationQuestion & Final Confidence Question Hypothesis Hypothesis and Topic Synthesis Merging & Decomposition Generation Evidence Scoring Analysis Ranking Hypothesis Hypothesis and Evidence Generation Scoring Answer & Confidence ... © 2011 IBM Corporation
10. 10. Example Question Keywords: 1894, C.W. Post, Related ContentIn 1894 C.W. Post created … (Structured & Unstructured)created his warm Lexical AnswerType: (Michingan city)cereal drink Postum in Date(1894)this Michigan city Primary Question Relations: Analysis Search Create(Post, cereal drink) … Candidate Answer Generation General Foods [0.58 0 -1.3 … 0.97] 1985 [0.71 1 13.4 … 0.72] Post Foods [0.12 0 2.0 … 0.40] [0.84 1 10.6 … 0.21] 1) Battle Creek (0.85) aramour Battle Creek 2) Post Foods ( 0.20) Grand Rapids [0.33 0 6.3 … 0.83] 3) 1985 (0.05) … [0.21 1 11.1 … 0.92] [0.91 0 -8.2 … 0.61] Merging & … Ranking … [0.91 0 -1.7 … 0.60] Evidence Retrieval Evidence © 2011 IBM Corporation Scoring
11. 11. Hypothesis Scoring Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this Michigan city Tycor Answer Scorers can be applied depending on different relations or constraints detected in the Temporal question. For example, this question focus with modifiers is “Michigan city.” Watson can detect this as a geospatial relation that indicates the correct answer must be a city spatially Spatial located within the sate of Michigan. Popularity … Candidate Answers Evidence Feature Scores (Answer Scoring + Passage Scoring) Doc Rank Pass Rank Ty Cor Geo General Foods 0 1 0.1 0 Post Foods 2 1 0.1 0 Battle Creek 1 2 0.8 1 Will Keith Kellogg 3 0.1 0 Grand Rapids 0.9 1 1895 0 0.0 0 © 2011 IBM Corporation
12. 12. Passage Scoring Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this Michigan city In Deep Evidence Scoring, Watson retrieves evidence for each candidate answer, then evaluates the evidence using a large number of deep evidence scoring analytics. The evidence for a candidate answer may come from the original document or passage where the candidate answer was generated, or it may come from an evidence retrieval search performed by taking the keyword search query from Step 2, replacing the focus terms with the candidate answer, and retrieving the relevant passages that are found. The passages, or “context” in which the candidate answer occurs are evaluated as evidence to support or refute the candidate answer as the correct answer for the question. Battle Creek General Foods Post Foods 1895: In Battle Creek, Michigan, C.W. 1854 C. W. Post (Charles William) was Post made thecamePOSTUM , a cereal C.W. Post first to the Battle Creek born. He founded the Postum Cereal Co. beverage. Post created GRAPE-NUTS sanitarium to cure his upset stomach. in 1895 (renamed General Foods Corp.breakfast General Foods products go from cereal in 1897, and POST TOASTIES He later created Postum, a cereal- in 1922) to manufacture warm nightcaps (Postum, (Posts cereals) to Postum cereal corn flakes in 1908 based coffee substitute Sanka), also wash the pots and pans that its beverageThe company was incorporated in 1922, foods are cooked in (S.O.S. Scouring Pads Post Foods, LLC, also known as Post Cerealshaving developed from the earlier Postum (formerly Postum Cereals) was founded by C.W.Cereal Co. Ltd., founded by C.W. Post Post. It began in 1895 with the first Postum, a(1854-1914) in 1895 in Battle Creek, Mich. "cereal beverage", developed by Post in BattleAfter a number of experiments, Post Creek, Michigan. The first cereal, Grape-Nuts,marketed his first product-the cereal It was named after C. W. Post, the founder of was developed in 1897beverage called Postum-in 1895 the Postum Cereal Company that later became General Foods. The cereal company unit was later sold off and is now Post Foods © 2011 IBM Corporation
13. 13. Merging and Confidence Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this … In the final processing step, Watson detects variants of the same answer and merges their feature scores together. Watson then computes the final confidence scores for the candidate answers by applying a series of Machine Learning models that weight all of the feature scores to produce the final confidence scores.Candidate Evidence Feature Scores CorrectAnswers Answer Doc Pass Ty Cor Geo LFAC Term Temp- Rank Rank S Match oral Final Answers Confi-General Foods 0 1 0.1 0 0.2 22 1 dence Battle Creek 0.946Post Foods 2 1 0.1 0 0.4 41 1 Machine Post Foods 0.152Battle Creek 1 2 0.8 1 0.5 30 0.9 Learning Model 1895 0.040Will Keith Kellogg 3 0.1 0 0 23 0.5 Application Grand Rapids 0.033Grand Rapids 0.9 1 0 10 0.5 General Foods 0.0141895 0 0.0 0 0 21 0.6 © 2011 IBM Corporation
14. 14. “Minimal” Deep QA Pipeline Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this Michigan cityQuestion Battle Creek Final Confidence Question Primary Hypothesis Hypothesis and Merging & Analysis Search Generation Evidence Scoring Ranking Document Search LAT Results Candidate Evidence Features Answers R Title Ty Cor Geo Final Answers Confi- Mitchigan General dence City 0 General Foods Foods 0.1 0 Battle Creek 0.946 Post 1 Battle Foods Creek 0.1 0 Post Foods 0.152 Battle 2 Post Foods Creek 0.8 1 1895 0.040 3 Will Keith © 2011 IBM Corporation Kellogg
15. 15. Example Question AnalysisIn 1894 C.W. Post created his warm cereal drink Postum in this Michigan city. Question Understanding Exists(x): City(x) & locatedIn(x, Michigan) & locationOf(e,x) & Creation(e, C.W. Post, Postum) & dateOf(e, 1894) Question Frame Person: C.W. Post AnswerType: City U.S. State: Michigan Date: 1984 Relations: create(C.W. Post, “cereal drink”), locatedIn(“city”, Michigan) © 2011 IBM Corporation
16. 16. In May 1898 Portugal celebrated the 400th anniversary of this On the 27th of May 1498, Vasco da explorer’s arrival in India. Gama landed in Kappad BeachCelebration(e), date(e,1898), celebrationOf(e,e1),location(e, Portugal), date(e1, dateOf(e) – 400)), In May, Gary arrived inarrival(e1), location(e1,India), particpantOf(e1,?x). India after he celebrated his anniversary in Portugal.Location(e2, Kappad Beach), Date(e2, 1498),landing(e2), particpantOf(e1,Vasco). Matching two passages is the basic Watson operation Semantic Technology helps match elements of them. © 2011 IBM Corporation
17. 17. Matching Keyword Evidence In May 1898 Portugal celebrated In May, Gary arrived in the 400th anniversary of this India after he celebrated his explorer’s arrival in India. anniversary in Portugal. arrived in celebrated Keyword Matching celebrated In May Keyword Matching In May 1898 400th Keyword Matching anniversary anniversaryEvidence suggests Portugal Keyword Matching in Portugal“Gary” is the answerBUT the system must arrival inlearn that keywordmatching may be India Keyword Matching Indiaweak relative to othertypes of evidence explorer Gary18 © 2011 IBM Corporation
18. 18. Matching Deeper Evidence In May 1898 Portugal celebrated On 27th May 1498, Vasco da Gama On 27th May 1498, Vasco da Gama On 27th May 1498, Vasco da Gama the 400th anniversary of this On landedin Kappad Beach Vasco da landed in of May 1498, the 27th Kappad Beach landed in Kappad Beach explorer’s arrival in India. Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Find Judge Evidence landed in Many inference algorithms Portugal Temporal May 1898 400th anniversary 27th May 1498 Reasoning Date Math arrival PassageStronger in Paraphrasing Para-evidence can phrase s GeoSpatialbe much India Reasoning Kappad Beachharder to find Geo- KBand score. explorer Vasco da Gama The evidence is still not 100% certain.19 © 2011 IBM Corporation
19. 19. TyCor Framework “Named after the 35th President, in 2010 this facility saw more international air traffic than any other in North America.” Problem: Do candidate answers match the type in the question? – 4 Steps: EDM (Entity Disambiguation and Matching), PDM (Predicate Disambiguation and Matching), TR (Type Retrieval), TA (Type Alignment) “JFK” EDM: Candidate  DBpedia:John_F_Kennedy_International (0.7) Instance (Cand) TR: Instance  Type WN:Airport (1.0)“facility” PDM: LAT  Type WN:Facility (0.9) (LAT) TA: Compare LAT-type and Airport is-a TyCor Instance-type Facility (1.0) Match! (0.63) © 2011 IBM CorporationJ.W. Murdock et al (2012). Typing candidate answers using type coercion
20. 20. Taking Watson to School What antiseptic for the skin has the For27th prevention Vascoda Gama On the May 1498, Vasco da Gama On 27th May 1498, of catheter- On 27th May 1498, Vasco da Gama 2% landed in Kappad Beach relatedin Kappad Beach lowest risk of bacteremia from landed central-line infections, a landed in Kappad Beach catheter insertion? chlorhexidine (CHG) and 70% isopropyl has the highest success Search Far and Wide rate. catheter Explore many hypotheses catheter Find Judge Evidence Many inference algorithms bacteremia from catheter-related central- catheter insertion line infections Temporal Reasoning Date prevention infections lowest risk of Math bacteremia Passage highest success Paraphrasing rate Para- phrase s GeoSpatial skin antiseptic Reasoning Geo- KB TyCor chlorhexidine21 © 2011 IBM Corporation
21. 21. Cut to the chase…..Watson emerges victorious © 2011 IBM Corporation
22. 22. Technology marches forward… © 2011 IBM Corporation
23. 23. The arrival of Cognitive ComputingLearn & improve. The core of Watson is a group ofover 100 independent algorithms that approximate a Assist & augment human cognition. Watsonsolution to the “is this the right answer to the question” depended on primarily a set of background documentsproblem. Achieving winning (human expert) (the corpus). The value of having access to this kindperformance, required two hallmarks of cognitive of fact-finding power over a large (and possiblycomputing systems: a metric to measure improvements changing) corpus provides a clear augmentation toto the system (the winners cloud), and a significant human abilities.ground truth (over 200K Q-A pairs). Interact in a natural way. Watson was a significantSpeed&Scale. Watson used big data, as well as a step forward in natural language understanding, the3000 node cluster for massive computation to get most basic interface for humans. Say goodbye toanswering speeds down into the 2s range. your mouse… © 2011 IBM Corporation
24. 24. The arrival of Cognitive ComputingLearn & improve. The core of Watson is a group ofover 100 independent algorithms that approximate asolution to the “is this the right answer to the question”problem. Achieving winning (human expert) 100%performance, required two hallmarks of cognitivecomputing systems: a metric to measure improvements 90%to the system (the winners cloud), and a significantground truth (over80% Q-A pairs). 200K 70% 60% 50% 40% 30% 20% 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered © 2011 IBM Corporation
25. 25. The arrival of Cognitive Computing Symptoms Assist & augment human cognition. Watson depended on primarily a set of background documents (the corpus). The value of having access to this kind Family History of fact-finding power over a large (and possibly Patient History changing) corpus provides a clear augmentation to Medications human abilities. Tests/Findings Diagnosis Models Confidence Renal failure Notes/Hypotheses UTI Diabetes Influenza hypokalemia Huge Volumes of esophogitis Texts, Journals, Reference s, DBs etc. Most Confident Diagnosis: UTI © 2011 IBM Corporation
26. 26. The arrival of Cognitive ComputingSpeed&Scale. Watson used big data, as well as a3000 node cluster for massive computation to getanswering speeds down into the 2s range. © 2011 IBM Corporation
27. 27. The arrival of Cognitive Computing Interact in a natural way. Watson was a significant step forward in natural language understanding, the most basic interface for humans. Say goodbye to your mouse… © 2011 IBM Corporation
28. 28. The arrival of Cognitive ComputingLearn & improve. The core of Watson is a group ofover 100 independent algorithms that approximate a Assist & augment human cognition. Watsonsolution to the “is this the right answer to the question” depended on primarily a set of background documentsproblem. Achieving winning (human expert) (the corpus). The value of having access to this kindperformance, required two hallmarks of cognitive of fact-finding power over a large (and possiblycomputing systems: a metric to measure improvements changing) corpus provides a clear augmentation toto the system (the winners cloud), and a significant human abilities.ground truth (over 200K Q-A pairs). Interact in a natural way. Watson was a significantSpeed&Scale. Watson used big data, as well as a step forward in natural language understanding, the3000 node cluster for massive computation to get most basic interface for humans. Say goodbye toanswering speeds down into the 2s range. your mouse… © 2011 IBM Corporation
29. 29. …and for Social Web First and foremost, social web analytics (e.g. recommendations) and Social Computing in general lie clearly in the realm of Cognitive Computing – Uncertainty, natural language, human intelligence – Inexact solutions that can improve with time, training – Problems & solutions need metrics to be solvable All cognitive computing systems require ground truth data – This data is expensive to collect – Crowdsourcing is a key new technology/approach The user interface moving closer to people – Natural language, speech, gestures – In addition, integrating the collection of training data seamlessly into the interface is a key development Cognitive computing systems require integration of multiple, disparate, data sources – Structured, unstructured, semi-structured – curated, crowdsourced © 2011 IBM Corporation