• Save
Inside the Mind of Watson: Cognitive Computing
Upcoming SlideShare
Loading in...5
×
 

Inside the Mind of Watson: Cognitive Computing

on

  • 790 views

Presentation given by Chris Welty (IBM Research) at Knoesis. We get the permission to upload this presentation from Chris Welty. Event details are at: http://j.mp/Welty-at-Knoesis and the associate ...

Presentation given by Chris Welty (IBM Research) at Knoesis. We get the permission to upload this presentation from Chris Welty. Event details are at: http://j.mp/Welty-at-Knoesis and the associate video is at: https://www.youtube.com/watch?v=grDKpicM5y0

Statistics

Views

Total Views
790
Views on SlideShare
787
Embed Views
3

Actions

Likes
3
Downloads
0
Comments
0

1 Embed 3

http://www.pearltrees.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Automatic Open-Domain Question Answering is represents a very long standing challenge in Computer Science, specifically in the areas of NLP, Information Retrieval and AI.
  • In order to know we are making progress on scientific problems like open-domain QA well-defined challenges help demonstrate we can solve concrete & difficult tasks. As you might know Jeopardy! Is a long-standing, well-regarded and highly challenging Television quiz show in the US that demands human contestants to quickly understand and answer richly expressed natural language questions over a staggering array of topics.The Jeopardy! Challenge uniquely provides a palpable, compelling and notable way to drive the technology of Question Answering along key dimensionsIf you are familiar with the quiz show it asks an I incredibly broad range of questions over a huge variety of topics.In a single round there is a grid of 6 Categories and for each category 5 rows with increasing $ values. Once a cell is chosen by 1 of three players, A question, or what is often called a Clue is revealed.Here you see some example questions. Jeopardy uses complex and often subtle language to describe what is being asked. To win you have to be extraordinarily precise. You must deliver the exact answer – no more and no less – it is not good enough for it be somewhere in the top 2, 10 or 20 documents – you must know it exactly and get it in first place – otherwise no credit – in fact you loose points. You must demonstrate Accurate Confidences -- That is -- you must know what you know – if you “buzz –in” and then get it wrong you lose the $$ value of the question. And you have to do this all very quickly – deeply analyze huge volumes of content, consider many possible answers, compute your confidence and buzz in – all in just seconds.As we shall see compete with human champions at this game represents a Grand Challenge in Automatic Open-Domain Question Answering.
  • Human performance is one of the things that makes the Jeopardy! Challenge so compelling. The best humans are very, very good at this task. In this chart, each dot corresponds to actual historical Jeopardy! games and represents the performance of the winner of those games. We refer to this cluster of dots as the “Winners Cloud”. For each dot, the X-axis, along the bottom of the graph, represents the percentage of questions in a game that the winning player got a chance to ANSWER. These were the questions he or she was confident enough and fast enough to ring-in or buzz in for FIRST. The Y-axis, going up along the left of the graph, represents the winning player’s PRECISION – that is, the percentage of those questions answered the player got RIGHT. CORemember, if a player gets a question wrong then they lose the $ value of the clue and their competitors still get a chance to answer or rebound. But what we humans, tend to do really, really well is – confidently know what we know – computing an accurate confidence turns out to be key ability for winning at Jeopardy! Looking at the center of the green cloud, what you see is that, on average, WINNERS are confident enough and fast enough to answer nearly 50% of the questions in a game and do somewhere between 85% and 95% precision on those questions. That is, they get 85-95% of the ones they answer RIGHT. The red dots represents Ken Jennings's performance. Ken won 74 consecutive games against qualified players. He was confident and fast enough to acquire 60% and even up to 80% of a game’s questions from his competitors and still do 85% and 95% precision on average.Good Jeopardy! players are remarkable in their breadth, precision, confidence and speed. 
  • Competing with the top human players requires answering 100’s of unseen Jeopardy! question by identifying the more than 75% you think you know and getting over 85% precision on them – this is hard task for computer programs.Our Open-Domain Question Answering technology, based on state-of-the-art approaches, developed with a modest team for over 5 Years and with some very esteemed company, turned on this challenge, left a huge gap.None-the-less at least part of our job at IBM Research is to push the limits of what computers CAN and CAN NOT do. So, In 2007 we believed it was possible and committed to try to make a Huge Leap in a relatively short amount of time.
  •  Watson – the computer system we developed to play Jeopardy! is based on the DeepQA softate archtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
  • We do NOT approach the Jeopardy Challenge by trying to anticipate all questions and building databases of answers. In fact, in a random sample of 20,000 Jeopardy Clues we automatically identified the main subject or type being asked about. We found that in 13% of the sampled questions, there was no clear indication at all for the type of answer and the players must rely almost entirely on the context to figure out what sort of answer is required.  The remaining 87% is what you see is this graph. It shows, what we call, a very long tail. There is no small-enough set of topics to focus on that covers enough ground. Even focusing on the most frequent few (The head of the tail to the left) will cover less-than 10% of the content. 1000’s of topics from hats to insects to writers to diseases to vegetables are all equally fair game. And FOR these 1000’s of types, 1000’s of different questions may be asked and then phrased in an huge variety of different of ways. So, our primary approach and research interest is not to collect and organize databases. Rather it is ON reusable Natural Language Processing (NLP) technology for automatically understanding naturally occurring human-language text. AS-IS, pre=existing structured knowledge in the form of DBs or KBs is used to help to bridge meaning and interpret multiple NL texts. But because of the broad domain and the expressive language used in the questions and in content, pre-built databases have very limited use of answering any significant number of questions. The focus rather is on NL understanding.
  •  Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, natural language is ambiguous, polysemous, tacit and its meaning is often highly contextual. Bottom line -- the computer needs to consider many possible meanings, attempting to find the inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and maintain multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and process many different evidence streams that might support or refute those hypotheses. Each component in the system adds assumptions about what the question means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical or syntactic components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. These searches are performed over a combination of unstructured data, natural language documents, and structured data, available knowledge bases. --The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is not a lot of confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus is on generating a broad set of hypotheses, – or for this application what we call “Candidate Answers”. To implement this step for Watson we used multiple open-source text and KB search components.The Soft Filtering step in DeepQA, acknowledges that resources are ultimately limited. And some parameterized judgment about which candidate answers are worth pursuing further must be made given constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and latency, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the filter would be eliminated from consideration entirely at this point.In Hypothesis & Evidence Scoring the candidate answers are scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?Evidence, in this case, more documents and or more structured facts, is collected for the many candidate answers. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages, for example, and score the likelihood that the passage supports or refutes the correctness of the candidate answer.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire, with varying levels of certainty, They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
  • Here we see the same question, the same parse, but on the other side we see that there exists a passage containing the RIGHT answer BUT with only one key word in common.  The system must consider in parallel and in detail a huge amount of content just to get a SHOT at this evidence and then must find and weigh the right inferences that will allow it to match and score with an accurate confidence, for example in this case   Date Math, Statistical Paraphrasing and Geospatial reasoning. And its still not 100% certain What if, for example, the passage said “considered landing in” rather than “landed in” or what if there was just a preponderance of weaker evidence for another answer. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard. Meaning can be expressed in so many different ways and to achieve high levels of precision and confidence you must consider much more information and analyze it much more deeply. We is needed is a radically different approach that explores many different plaussive interpretations in parallel and collects and evaluates all sorts of evidence in support or in refutation of those possibilities.
  • Here we see the same question, the same parse, but on the other side we see that there exists a passage containing the RIGHT answer BUT with only one key word in common.  The system must consider in parallel and in detail a huge amount of content just to get a SHOT at this evidence and then must find and weigh the right inferences that will allow it to match and score with an accurate confidence, for example in this case   Date Math, Statistical Paraphrasing and Geospatial reasoning. And its still not 100% certain What if, for example, the passage said “considered landing in” rather than “landed in” or what if there was just a preponderance of weaker evidence for another answer. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard. Meaning can be expressed in so many different ways and to achieve high levels of precision and confidence you must consider much more information and analyze it much more deeply. We is needed is a radically different approach that explores many different plaussive interpretations in parallel and collects and evaluates all sorts of evidence in support or in refutation of those possibilities.
  • Here we see the same question, the same parse, but on the other side we see that there exists a passage containing the RIGHT answer BUT with only one key word in common.  The system must consider in parallel and in detail a huge amount of content just to get a SHOT at this evidence and then must find and weigh the right inferences that will allow it to match and score with an accurate confidence, for example in this case   Date Math, Statistical Paraphrasing and Geospatial reasoning. And its still not 100% certain What if, for example, the passage said “considered landing in” rather than “landed in” or what if there was just a preponderance of weaker evidence for another answer. Question Answering Technology tries to understand what the user is really asking for and to deliver precise and correct responses. But Natural language is hard. Meaning can be expressed in so many different ways and to achieve high levels of precision and confidence you must consider much more information and analyze it much more deeply. We is needed is a radically different approach that explores many different plaussive interpretations in parallel and collects and evaluates all sorts of evidence in support or in refutation of those possibilities.

Inside the Mind of Watson: Cognitive Computing Inside the Mind of Watson: Cognitive Computing Presentation Transcript

  • Inside the Mind of Watson:Cognitive ComputingChris WeltyIBM Researchibmwatson.comDo Not Record. Do Not Distribute. © 2011 IBM Corporation
  • What is Watson?  Open Domain Question-Answering Machine  Given – Rich Natural Language Questions – Over a Broad Domain of Knowledge  Delivers – Precise Answers: Determine what is being asked & give precise response – Accurate Confidences: Determine likelihood answer is correct – Consumable Justifications: Explain why the answer is right – Fast Response Time: Precision & Confidence in <3 seconds – At the level of human experts – Proved its mettle in a televised match – Won a 2-game Jeopardy match against the all-time winners – viewed by over 50,000,0002 © 2011 IBM Corporation
  • What is Jeopardy? Jeopardy! is an American quiz show – 1964 – Today – Household name in U.S. answer-and-question format – contestants are presented with clues in the form of answers – must phrase their responses in question form. – Open domain trivia questions, speed is a big factor Example – Category: General Science – Clue: When hit by electrons, a phosphor gives off electromagnetic energy in this form – Answer: What is light? © 2011 IBM Corporation
  • What is Cognitive Computing? Increasingly, machines are being asked to add their computational power to problems which are not inherently solvable Traditionally, these problems came from AI – The hardest AI problems are the easiest for human intelligence: vision, speech, natural language – these are not actually associated with “being intelligent” – Human intelligence provides solutions, but does not scale Cognitive Computing is founded on four principles Learn & improve. Cognitive computing systems Assist & augment human cognition. Cognitive focus on inexact solutions to unsolvable problems computing addresses problems that lie squarely in that utilize machine learning and improve over time. the province of human intelligence, but where we Often they combine multiple approaches and must cant handle the volume of information, penetrate the integrate them effectively. They must learn from complexity or otherwise extend our reach humans, in more and more seamless ways. (physically). Interact in a natural way. Cognitive computing Speed&Scale. Cognitive computing harnesses the provides technologies that support a higher level of clear advantage machines have over humans in their human cognition by adapting to human approaches ability to perform mundane tasks of arbitrary and interfaces...over the next several decades it will complexity repeatedly, whether it is the scale of the incorporate essentially all the ways humans sense data or the complexity of the task. and interact. © 2011 IBM Corporation
  • Examples of Cognitive ComputingWeb SearchImage SearchEvent SearchSocial ComputingNatural Language Processing © 2011 IBM Corporation
  • The Jeopardy! ChallengeHard for humans, hard for machines $200 $1000 Broad/Open If you are looking at The first person Domain the wainscoating,for different reasons.by name in But hard mentioned you are looking in ‘The Man in the Iron this direction. Mask’ is this hero of a Complex previous book by the Language Who is same author. What is down? D’Artagnan? High For people, the challenge is knowing the answer Precision For machines, the challenge is understanding the question Accurate Confidence $600 $800 In cell division, mitosis The conspirators against splits the nucleus & this man were wounded by High cytokinesis splits this each other while they What is liquid cushioning the Who is Julius stabbed at him Speed nucleus cytoplasm? Caesar?6 © 2011 IBM Corporation
  • The Winner’s CloudWhat It Takes to compete against Top Human Jeopardy! Players Each dot – actual historical human Jeopardy! gamesTop humanplayers areremarkably good. Winning Human Performance Grand Champion Human Performance 2007 QA Computer System More Confident Less Confident © 2011 IBM Corporation
  • The Winner’s Cloud What It Takes to compete against Top Human Jeopardy! Players Each dot – actual historical human Jeopardy! games Winning Human In 2007, we committed to Performance making a Huge Leap! Grand Champion Human PerformanceComputers? 2007 QA Computer SystemNot So Good. More Confident Less Confident © 2011 IBM Corporation
  • DeepQA: The Technology Behind Watson An example of a new software paradigm DeepQA generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence. Learned Models help combine and weigh the Evidence Evidence Sources Answer Models Models Sources DeepQuestion Answer Evidence Models Models Evidence Candidate Scoring Retrieval Primary Scoring Answer Models Models Search GenerationQuestion & Final Confidence Question Hypothesis Hypothesis and Topic Synthesis Merging & Decomposition Generation Evidence Scoring Analysis Ranking Hypothesis Hypothesis and Evidence Generation Scoring Answer & Confidence ... © 2011 IBM Corporation
  • Example Question Keywords: 1894, C.W. Post, Related ContentIn 1894 C.W. Post created … (Structured & Unstructured)created his warm Lexical AnswerType: (Michingan city)cereal drink Postum in Date(1894)this Michigan city Primary Question Relations: Analysis Search Create(Post, cereal drink) … Candidate Answer Generation General Foods [0.58 0 -1.3 … 0.97] 1985 [0.71 1 13.4 … 0.72] Post Foods [0.12 0 2.0 … 0.40] [0.84 1 10.6 … 0.21] 1) Battle Creek (0.85) aramour Battle Creek 2) Post Foods ( 0.20) Grand Rapids [0.33 0 6.3 … 0.83] 3) 1985 (0.05) … [0.21 1 11.1 … 0.92] [0.91 0 -8.2 … 0.61] Merging & … Ranking … [0.91 0 -1.7 … 0.60] Evidence Retrieval Evidence © 2011 IBM Corporation Scoring
  • Hypothesis Scoring Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this Michigan city Tycor Answer Scorers can be applied depending on different relations or constraints detected in the Temporal question. For example, this question focus with modifiers is “Michigan city.” Watson can detect this as a geospatial relation that indicates the correct answer must be a city spatially Spatial located within the sate of Michigan. Popularity … Candidate Answers Evidence Feature Scores (Answer Scoring + Passage Scoring) Doc Rank Pass Rank Ty Cor Geo General Foods 0 1 0.1 0 Post Foods 2 1 0.1 0 Battle Creek 1 2 0.8 1 Will Keith Kellogg 3 0.1 0 Grand Rapids 0.9 1 1895 0 0.0 0 © 2011 IBM Corporation
  • Passage Scoring Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this Michigan city In Deep Evidence Scoring, Watson retrieves evidence for each candidate answer, then evaluates the evidence using a large number of deep evidence scoring analytics. The evidence for a candidate answer may come from the original document or passage where the candidate answer was generated, or it may come from an evidence retrieval search performed by taking the keyword search query from Step 2, replacing the focus terms with the candidate answer, and retrieving the relevant passages that are found. The passages, or “context” in which the candidate answer occurs are evaluated as evidence to support or refute the candidate answer as the correct answer for the question. Battle Creek General Foods Post Foods 1895: In Battle Creek, Michigan, C.W. 1854 C. W. Post (Charles William) was Post made thecamePOSTUM , a cereal C.W. Post first to the Battle Creek born. He founded the Postum Cereal Co. beverage. Post created GRAPE-NUTS sanitarium to cure his upset stomach. in 1895 (renamed General Foods Corp.breakfast General Foods products go from cereal in 1897, and POST TOASTIES He later created Postum, a cereal- in 1922) to manufacture warm nightcaps (Postum, (Posts cereals) to Postum cereal corn flakes in 1908 based coffee substitute Sanka), also wash the pots and pans that its beverageThe company was incorporated in 1922, foods are cooked in (S.O.S. Scouring Pads Post Foods, LLC, also known as Post Cerealshaving developed from the earlier Postum (formerly Postum Cereals) was founded by C.W.Cereal Co. Ltd., founded by C.W. Post Post. It began in 1895 with the first Postum, a(1854-1914) in 1895 in Battle Creek, Mich. "cereal beverage", developed by Post in BattleAfter a number of experiments, Post Creek, Michigan. The first cereal, Grape-Nuts,marketed his first product-the cereal It was named after C. W. Post, the founder of was developed in 1897beverage called Postum-in 1895 the Postum Cereal Company that later became General Foods. The cereal company unit was later sold off and is now Post Foods © 2011 IBM Corporation
  • Merging and Confidence Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this … In the final processing step, Watson detects variants of the same answer and merges their feature scores together. Watson then computes the final confidence scores for the candidate answers by applying a series of Machine Learning models that weight all of the feature scores to produce the final confidence scores.Candidate Evidence Feature Scores CorrectAnswers Answer Doc Pass Ty Cor Geo LFAC Term Temp- Rank Rank S Match oral Final Answers Confi-General Foods 0 1 0.1 0 0.2 22 1 dence Battle Creek 0.946Post Foods 2 1 0.1 0 0.4 41 1 Machine Post Foods 0.152Battle Creek 1 2 0.8 1 0.5 30 0.9 Learning Model 1895 0.040Will Keith Kellogg 3 0.1 0 0 23 0.5 Application Grand Rapids 0.033Grand Rapids 0.9 1 0 10 0.5 General Foods 0.0141895 0 0.0 0 0 21 0.6 © 2011 IBM Corporation
  • “Minimal” Deep QA Pipeline Category: MICHIGAN MANIA Clue: In 1894 C.W. Post created his warm cereal drink Postum in this Michigan cityQuestion Battle Creek Final Confidence Question Primary Hypothesis Hypothesis and Merging & Analysis Search Generation Evidence Scoring Ranking Document Search LAT Results Candidate Evidence Features Answers R Title Ty Cor Geo Final Answers Confi- Mitchigan General dence City 0 General Foods Foods 0.1 0 Battle Creek 0.946 Post 1 Battle Foods Creek 0.1 0 Post Foods 0.152 Battle 2 Post Foods Creek 0.8 1 1895 0.040 3 Will Keith © 2011 IBM Corporation Kellogg
  • Example Question AnalysisIn 1894 C.W. Post created his warm cereal drink Postum in this Michigan city. Question Understanding Exists(x): City(x) & locatedIn(x, Michigan) & locationOf(e,x) & Creation(e, C.W. Post, Postum) & dateOf(e, 1894) Question Frame Person: C.W. Post AnswerType: City U.S. State: Michigan Date: 1984 Relations: create(C.W. Post, “cereal drink”), locatedIn(“city”, Michigan) © 2011 IBM Corporation
  • In May 1898 Portugal celebrated the 400th anniversary of this On the 27th of May 1498, Vasco da explorer’s arrival in India. Gama landed in Kappad BeachCelebration(e), date(e,1898), celebrationOf(e,e1),location(e, Portugal), date(e1, dateOf(e) – 400)), In May, Gary arrived inarrival(e1), location(e1,India), particpantOf(e1,?x). India after he celebrated his anniversary in Portugal.Location(e2, Kappad Beach), Date(e2, 1498),landing(e2), particpantOf(e1,Vasco). Matching two passages is the basic Watson operation Semantic Technology helps match elements of them. © 2011 IBM Corporation
  • Matching Keyword Evidence In May 1898 Portugal celebrated In May, Gary arrived in the 400th anniversary of this India after he celebrated his explorer’s arrival in India. anniversary in Portugal. arrived in celebrated Keyword Matching celebrated In May Keyword Matching In May 1898 400th Keyword Matching anniversary anniversaryEvidence suggests Portugal Keyword Matching in Portugal“Gary” is the answerBUT the system must arrival inlearn that keywordmatching may be India Keyword Matching Indiaweak relative to othertypes of evidence explorer Gary18 © 2011 IBM Corporation
  • Matching Deeper Evidence In May 1898 Portugal celebrated On 27th May 1498, Vasco da Gama On 27th May 1498, Vasco da Gama On 27th May 1498, Vasco da Gama the 400th anniversary of this On landedin Kappad Beach Vasco da landed in of May 1498, the 27th Kappad Beach landed in Kappad Beach explorer’s arrival in India. Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Find Judge Evidence landed in Many inference algorithms Portugal Temporal May 1898 400th anniversary 27th May 1498 Reasoning Date Math arrival PassageStronger in Paraphrasing Para-evidence can phrase s GeoSpatialbe much India Reasoning Kappad Beachharder to find Geo- KBand score. explorer Vasco da Gama The evidence is still not 100% certain.19 © 2011 IBM Corporation
  • TyCor Framework “Named after the 35th President, in 2010 this facility saw more international air traffic than any other in North America.” Problem: Do candidate answers match the type in the question? – 4 Steps: EDM (Entity Disambiguation and Matching), PDM (Predicate Disambiguation and Matching), TR (Type Retrieval), TA (Type Alignment) “JFK” EDM: Candidate  DBpedia:John_F_Kennedy_International (0.7) Instance (Cand) TR: Instance  Type WN:Airport (1.0)“facility” PDM: LAT  Type WN:Facility (0.9) (LAT) TA: Compare LAT-type and Airport is-a TyCor Instance-type Facility (1.0) Match! (0.63) © 2011 IBM CorporationJ.W. Murdock et al (2012). Typing candidate answers using type coercion
  • Taking Watson to School What antiseptic for the skin has the For27th prevention Vascoda Gama On the May 1498, Vasco da Gama On 27th May 1498, of catheter- On 27th May 1498, Vasco da Gama 2% landed in Kappad Beach relatedin Kappad Beach lowest risk of bacteremia from landed central-line infections, a landed in Kappad Beach catheter insertion? chlorhexidine (CHG) and 70% isopropyl has the highest success Search Far and Wide rate. catheter Explore many hypotheses catheter Find Judge Evidence Many inference algorithms bacteremia from catheter-related central- catheter insertion line infections Temporal Reasoning Date prevention infections lowest risk of Math bacteremia Passage highest success Paraphrasing rate Para- phrase s GeoSpatial skin antiseptic Reasoning Geo- KB TyCor chlorhexidine21 © 2011 IBM Corporation
  • Cut to the chase…..Watson emerges victorious © 2011 IBM Corporation
  • Technology marches forward… © 2011 IBM Corporation
  • The arrival of Cognitive ComputingLearn & improve. The core of Watson is a group ofover 100 independent algorithms that approximate a Assist & augment human cognition. Watsonsolution to the “is this the right answer to the question” depended on primarily a set of background documentsproblem. Achieving winning (human expert) (the corpus). The value of having access to this kindperformance, required two hallmarks of cognitive of fact-finding power over a large (and possiblycomputing systems: a metric to measure improvements changing) corpus provides a clear augmentation toto the system (the winners cloud), and a significant human abilities.ground truth (over 200K Q-A pairs). Interact in a natural way. Watson was a significantSpeed&Scale. Watson used big data, as well as a step forward in natural language understanding, the3000 node cluster for massive computation to get most basic interface for humans. Say goodbye toanswering speeds down into the 2s range. your mouse… © 2011 IBM Corporation
  • The arrival of Cognitive ComputingLearn & improve. The core of Watson is a group ofover 100 independent algorithms that approximate asolution to the “is this the right answer to the question”problem. Achieving winning (human expert) 100%performance, required two hallmarks of cognitivecomputing systems: a metric to measure improvements 90%to the system (the winners cloud), and a significantground truth (over80% Q-A pairs). 200K 70% 60% 50% 40% 30% 20% 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered © 2011 IBM Corporation
  • The arrival of Cognitive Computing Symptoms Assist & augment human cognition. Watson depended on primarily a set of background documents (the corpus). The value of having access to this kind Family History of fact-finding power over a large (and possibly Patient History changing) corpus provides a clear augmentation to Medications human abilities. Tests/Findings Diagnosis Models Confidence Renal failure Notes/Hypotheses UTI Diabetes Influenza hypokalemia Huge Volumes of esophogitis Texts, Journals, Reference s, DBs etc. Most Confident Diagnosis: UTI © 2011 IBM Corporation
  • The arrival of Cognitive ComputingSpeed&Scale. Watson used big data, as well as a3000 node cluster for massive computation to getanswering speeds down into the 2s range. © 2011 IBM Corporation
  • The arrival of Cognitive Computing Interact in a natural way. Watson was a significant step forward in natural language understanding, the most basic interface for humans. Say goodbye to your mouse… © 2011 IBM Corporation
  • The arrival of Cognitive ComputingLearn & improve. The core of Watson is a group ofover 100 independent algorithms that approximate a Assist & augment human cognition. Watsonsolution to the “is this the right answer to the question” depended on primarily a set of background documentsproblem. Achieving winning (human expert) (the corpus). The value of having access to this kindperformance, required two hallmarks of cognitive of fact-finding power over a large (and possiblycomputing systems: a metric to measure improvements changing) corpus provides a clear augmentation toto the system (the winners cloud), and a significant human abilities.ground truth (over 200K Q-A pairs). Interact in a natural way. Watson was a significantSpeed&Scale. Watson used big data, as well as a step forward in natural language understanding, the3000 node cluster for massive computation to get most basic interface for humans. Say goodbye toanswering speeds down into the 2s range. your mouse… © 2011 IBM Corporation
  • …and for Social Web First and foremost, social web analytics (e.g. recommendations) and Social Computing in general lie clearly in the realm of Cognitive Computing – Uncertainty, natural language, human intelligence – Inexact solutions that can improve with time, training – Problems & solutions need metrics to be solvable All cognitive computing systems require ground truth data – This data is expensive to collect – Crowdsourcing is a key new technology/approach The user interface moving closer to people – Natural language, speech, gestures – In addition, integrating the collection of training data seamlessly into the interface is a key development Cognitive computing systems require integration of multiple, disparate, data sources – Structured, unstructured, semi-structured – curated, crowdsourced © 2011 IBM Corporation