IBM’s Watson: Is the World’s Trivia Champion the Future of Search? Josh Dreller VP Media Technology & Analytics Fuor Digital @fuordigital
Two Types of Innovation Incremental – improving current projects and advancing them in a linear fashion Grand Challenges – pushing the limits of science Must be Difficult (has to be a challenge) Must be Inspiring Irresistible Vision – “just has to be done”
Previous IBM Grand Challenges
Open Question Answering The way normal humans communicate Natural Language – very ambiguous, but at the heart of human intelligence Last night I shot an elephant in my pajamas. How it got into my pajamas I’ll never know.
The Ultimate Advancement:Computers that can communicate with humans in Natural Language
What Can Search Learn From Watson? The only way to push forward is to take huge leaps and look for self-imposed challenges even if we can’t prove out the business case right now What kind of Grand Challenges could Search create? A non-spammable Search Engine? No need for Search Engine Optimization? ?? 8
The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key Dimensions Broad/Open Domain $200 If you're standing, it's the direction you should look to check out the wainscoting. Complex Language $1000 Of the 4 countries in the world that the U.S. does not have diplomatic relations with, the one that’s farthest north High Precision $800 In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus Accurate Confidence High Speed
Jeopardy Reaction Not too inspired at first Weren’t interested in a circus sideshow 2009, IBM setup a moch-studio at their New York research facility Sparring matches with ex-Jeopardy winners Eventually saw the potential and thought it was something special
A Very Simple Question for a Computer In (( 12,546,798 * P ) ^ 2) / 34,567.46 = ? Greater than or less than 1? 50/50 Shot = .00885
Real Language is Real Hard Chess A finite, mathematically well-defined search space Limited number of moves and states Grounded in explicit, unambiguous mathematical rules Human Language Ambiguous, contextual and implicit Grounded only in human cognition Seemingly infinitenumber of ways to express the same meaning
The Opposite of Current Computer Language Questions not designed for a computer to answer Slang Crafty questions Shorthand Rhyme Regionalism Anagrams Complex Language!
Structured vs. Unstructured Data Where was Einstein born? Structured Unstructured One day, from among his city views of Ulm, Otto chose a watercolor to send to Albert Einstein as a remembrance of Einstein´s birthplace.
Common Sense Knowledge Base An ontology of classes and individuals Parts and materials of objects Properties of objects (such as color and size) Functions and uses of objects Locations of objects and layouts of locations Locations of actions and events Durations of actions and events Preconditions of actions and events Effects (post conditions) of actions and events Subjects and objects of actions Behaviors of devices Stereotypical situations or scripts Human goals and needs Emotions Plans and strategies Story themes Contexts 16 Can a can CanCan?
What Can Search Learn From Watson? We need to focus on what computers aren’t good at, not what they are good at Keywords and Links are not savvy enough. Natural language is key to a next generation search engine Most of human knowledge is kept in unstructured data sources or based on common sense context 17
The DeepQA Project Dr. David Ferrucci 25-30 full time researchers from many disciplines. 2007-2011 Millions of dollars Post Jeopardy implications
Speed Results Deployed Watson on 2,880 IBM POWER 750 computer cores Went from 2 hours per question on a single CPU to an average ofjust 3 seconds – fast enough to compete with the best.
Example Question Related Content (Structured & Unstructured) Keywords: 1698, comet, paramour, pink, … AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) … Primary Search Question Analysis Candidate Answer Generation … Temporal Taxonomic Lexical Spatial [0.58 0 -1.3 … 0.97] Isaac Newton [0.71 1 13.4 … 0.72] Wilhelm Tempel [0.12 0 2.0 … 0.40] HMS Paramour Edmond Halley (0.85) Christiaan Huygens (0.20) Peter Sellers (0.05) [0.84 1 10.6 … 0.21] Christiaan Huygens [0.33 0 6.3 … 0.83] Halley’s Comet [0.21 1 11.1 … 0.92] Edmond Halley Merging & Ranking [0.91 0 -8.2 … 0.61] Pink Panther [0.91 0 -1.7 … 0.60] Peter Sellers Evidence Retrieval … Evidence Scoring IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE
Confidence is Key Watson only rings in if it can reach a statistically significant confidence in time Some questions take longer than others Some questions will be able to answer less confident than others Watson manages risk in betting based on confidence
Embarrassingly Parallel Computing Def: “Little or no effort is required to separate the problem into a number of parallel tasks” Works on many algorithms at once and comes back together with confidence scores. different from distributed computing problems (such as Google’s MapReduce) that require communication between tasks, especially communication of intermediate results.
What Can Search Learn From Watson? We can’t be bound by the constraints of current technology Two fold process of first coming up with answers then vetting them with more evidence Ideas like Parallel Processing will allow us to jump ahead 24
Ken Jennings & Brad Rutter
Top human players are remarkably good. Computers? The Best Human Performance: Our Analysis Reveals the Winner’s Cloud Each dot represents an actual historical human Jeopardy! game Winning Human Performance Grand Champion Human Performance 2007 QA Computer System More Confident Less Confident 27
DeepQA: Incremental Progress in Precision and Confidence 6/2007-11/2010 Now Playing in the Winners Cloud 11/2010 4/2010 10/2009 5/2009 12/2008 Precision 8/2008 5/2008 12/2007 Baseline
What Can Search Learn From Watson? Confidence bar would be a great addition to SERPs We must benchmark what is “good” and then aim higher These things take time (and money) 30
TJ Watson Research CenterYorktown, NYTwo Games: Aired February 14-16, 2011
The End – Humans Win
Financial Industry Generates large amounts of data and growing 70% per year Not just numbers, but all info that would influence the biz landscape (news, articles, blogs, etc) Recent financial crisis shows failures of lack of understanding in interdependencies
DeepQA in Continuous Evidence-Based Diagnostic Analysis Considers and synthesizes a broad range of evidence improving quality, reducing cost Symptoms Diagnosis Models Meds Symp Fam Hist Find Confidence Renal failure Family History Patient History UTI Medications Tests/Findings Diabetes Influenza Notes/Hypotheses hypokalemia esophogitis Most Confident Diagnosis: Diabetes Most Confident Diagnosis: UTI Most Confident Diagnosis: Influenza Most Confident Diagnosis: Diabetes and Esophogitis Huge Volumes of Texts, Journals, References, DBs etc.
What Can Search Learn From Watson? Even the most daunting task can be overcome It’s not company versus company, it’s stretching human knowledge How can search engines help other industries 37
Sources “What is Watson?” presentation by Adam Lally, IBM Research
Jeopardy website with videos: http://www.jeopardy.com/minisites/watson/
NYTimes article: “What Is I.B.M.’s Watson?http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?_r=2&ref=opinion
Wired magazine:“IBM’s Watson Supercomputer Wins Practice Jeopardy Round”http://www.wired.com/epicenter/2011/01/ibm-watson-jeopardy/# More technical: AI magazine “Building Watson: An Overview of the DeepQA Project”http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf