Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!

sBD07 presented at IBM Edge 2012 conference.

  • Login to see the comments

IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!

  1. 1. sBD07 IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy! Tony Pearson IBM Master Inventor and Senior Managing Consultant#IBMEDGE © 2012 IBM Corporation
  2. 2. In 2011, the IBM Watson computer was able to beat the top-earning human winners on the trivia game-show “Jeopardy!” Presenter Tony Pearson, author of "How to Build Your Own Watson Jr. in Your Basement," will explain how the IBM Watson system was put together, how it works, and what examples of text mining and big data analytics means for society as we apply technology to meet tomorrows challenges.#IBMEDGE © 2012 IBM Corporation
  3. 3. Sessions -- Tony Pearson • Monday – 1:00pm Storing Archive Data for Compliance Challenges – 4:15pm IBM Watson: What it Means for Society • Tuesday – 4:15pm Using Social Media: Birds of a Feather (BOF) • Wednesday – 9:00am Data Footprint Reduction: IBM Storage options – 2:30pm IBMs Storage Strategy in the Smarter Computing era – 4:15pm IBM SONAS and the Cloud Storage Taxonomy • Thursday – 9:00am IBM Watson: What it Means for Society – 10:30am Tivoli Storage Productivity Center Overview – 5:30pm IBM Edge “Free for All” hosted by Scott Drummond 3#IBMEDGE © 2012 IBM Corporation
  4. 4. Today’s Information Challenge Data volume is expanding at an incredible rate …Data will grow 800% in the next five years …Unstructured data growing faster than structured Data is getting more social …20M articles on Wikipedia …30B pieces of Facebook content are shared monthly …There are 156M public blogs An estimated 2 billion people are now on the Web … and a trillion connected objects – cars, appliances, cameras, roadways, pipelines 4#IBMEDGE © 2012 IBM Corporation
  5. 5. Dying of Thirst in an Ocean of Data 90% 80% 20% of the world’s data of the world’s amount of available was created in the data today is data traditional last two years unstructured systems leverages 1 in 2 83% 54% Business leaders don’t CIO’s cited BI and analytics Companies use have access to data as part of their visionary analytics for they need plan competitive advantageSource: GigaOM, Software Group, IBM Institute for Business Value" 5#IBMEDGE © 2012 IBM Corporation
  6. 6. What if an enterprise had all theanswers it needs to succeed? Can we design a computing system that rivals a human’s ability to retrieve, analyze and interpret vast amounts of information? 6#IBMEDGE © 2012 IBM Corporation
  7. 7. The Jeopardy! Challenge:A compelling and notable way to drive and measure the technology ofautomatic Question Answering along 5 Key Dimensions $200 $1000 If youre standing, its the The first person mentioneddirection you should look to by name in ‘The Man in the 1. Broad Domain check out the wainscoting. Iron Mask’ is this hero of a 2. Complex Language previous book by the same 3. High Precision author. 4. Accurate Confidence 5. High Speed $600 $2000 This actor, Audrey’s husband In cell division, mitosis from 1954 to 1968, directed splits the nucleus & her as Rima the bird girl in cytokinesis splits this ‘Green Mansions’ liquid cushioning the nucleus 7 #IBMEDGE © 2012 IBM Corporation
  8. 8. The Plan: Compete against Humansto Demonstrate Technology 8#IBMEDGE © 2012 IBM Corporation
  9. 9. Do you Remember IBM Deep Blue? •Chess – A finite, mathematically well-defined search space – Limited number of moves and states – All the symbols are completely grounded in the mathematical rules of the game •Human Language – Words by themselves convey different meanings – Only grounded in human cognition – Words navigate, align and communicate an infinite space of intended meaning – Computers can not ground words to human experiences to derive meaning 9#IBMEDGE © 2012 IBM Corporation
  10. 10. Why is it so hard for computers tounderstand humans? Structured Data Unstructured Data Physicist Birth Place “One day, from among his cityWhere was A. Einstein Ulm views of Ulm, Otto chose a water color to send to Albert Einstein N. Bohr Copenhagen Einstein as a remembrance of born? M. Curie Warsaw Einstein´s birthplace” Source: Spreadsheet, Database, etc. Source: Person Organization L. Gerstner IBM “If leadership is an art then surely Jack Welch has proved Welch ran J. Welch GE himself a master painter during this? W. Gates Microsoft his tenure at GE” Source: Jack Welch and the GE Way, Robert Slater Source: Spreadsheet, Database, etc. Source: IBM Research 10#IBMEDGE © 2012 IBM Corporation
  11. 11. Informed Decision Making: Search vs. Expert Q&A Expert Understands Question Decision Maker Asks NL Question Produces Possible Answers & Evidence Analyzes Evidence, Computes Confidence Considers Answer & Evidence Delivers Response, Evidence & Confidence Decision Maker Has Question Search Engine Distills to 2-3 Keywords Finds Documents containing Keywords Reads Documents, Finds Matches Delivers Documents based on Popularity Finds & Analyzes Evidence 11 #IBMEDGE © 2012 IBM Corporation11
  12. 12. Computers in SciFi Movies…Lots of Lights and Cardboard!IBM Watson….Lots of Work! 12#IBMEDGE © 2012 IBM Corporation
  13. 13. Unstructured Information Management Architecture (UIMA) To date, UIMA is the only industry standard for content analytics. Developed by IBM, it is now an OASIS standard, with source code for reference implementation kept by Apache Software Foundation.Question 1000’s of 100s Possible 100,000’s scores from many simultaneous Pieces of Evidence Text Analysis Algorithms 100s sources Answers Multiple InterpretationsQuestion & Final Confidence Question Hypothesis Hypothesis and Topic Synthesis Merging & Decomposition Generation Evidence Scoring Analysis Ranking Hypothesis Hypothesis and Evidence Generation Scoring Answer & Confidence ... 13 #IBMEDGE © 2012 IBM Corporation
  14. 14. Answering Precision, circa 2006 Each dot represents an actual historical human Jeopardy! game Winning Human Winning Human Performance Performance Grand Champion Grand Champion Computers, circa 2006 Computers, circa 2006 Human Human Performance Performance 14#IBMEDGE © 2012 IBM Corporation
  15. 15. DeepQA: Progress in Answering IBM WatsonPrecision Playing in the Winners Cloud v0.8 11/10 V0.7 04/10 v0.6 10/09 v0.5 05/09 v0.4 12/08 v0.3 08/08 v0.2 05/08 v0.1 12/07 Baseline 12/06 15#IBMEDGE © 2012 IBM Corporation
  16. 16. Precision / Confidence & Speed• Deep Analytics – Combining many analytics in a novel architecture, we achieved very high levels of Precision and Confidence over a huge variety of content.• Speed – By optimizing Watson’s computation for Jeopardy! on 2,880 POWER7 processing cores we went from 2 hours per question on a single CPU to an average of just 3 seconds.• Results – in 55 real-time sparring games against former Tournament of Champion Players in 2010, Watson put on a very competitive performance in all games -- placing 1st in 71% of the them! 16 #IBMEDGE © 2012 IBM Corporation
  17. 17. IBM Watson vs. Wolfram AlphaSource: © 2012 IBM Corporation
  18. 18. Real-Time Game Configuration Insulated and Clue Grid Self-Contained Human Player 1 Decisions to Watson’s Buzz and Bet QA Engine Clue & Strategy Category Watson’s 2,880 Jeopardy! Game IBM POWER7 Game Controller Compute Control Answers & Cores Text-to-Speech System Confidences 15 TB of Memory Human Player 2 Analysis of natural language content equivalent to 1 Million Books Clues, Scores & Other Game Data 18#IBMEDGE © 2012 IBM Corporation
  19. 19. IBM Watson Components 10 Frames of Hardware: • 88 Compute nodes Decisions to • 2 Storage nodes Clue & Buzz and Bet Category • 4 SAS disk shelves Strategy • 10GbE Network Watson’s Watson’s • 80 TeraFLOPS Game QA Engine Controller Software: Text-to-Speech Answers & 2,880 Confidences IBM POWER7 • SLES 11 Linux Compute • Clustered NFS Cores • GPFS file system Avatar - Voice synthesis, 15 TB of • Hadoop, UIMA-AS Apple Mac strategies for betting, DRAM memory • 700 KLOC Java notebook buzzing in, clue • 300 KLOC C++ selection & IBM Power750 exchanging info with servers (POWER7) Jeopardy Computers Windows 7 Lenovo desktop 19#IBMEDGE © 2012 IBM Corporation
  20. 20. Storage in IBM Watson 16 TB of DRAM memory • Sources – Encyclopedias – Dictionaries – Books and Web pages – Quotations – Wire News 2 IBM Power750 servers • Clustered NFS – Knowledge Bases • IBM GPFS file system – Lexical Database of English • Test and Training Data 4 IBM EXP 12S disk shelves – J! Archive • 48 x 450GB SAS drives • 10.4 TB of RAID-1 space • 1 TB of unstructured data#IBMEDGE © 2012 IBM Corporation
  21. 21. On February 14-16, 2011, IBM Watson changed history, introducing a system that rivaled a human’s ability to answer questions posed in natural language with speed, accuracy and confidence. Watson Wins! Largest Jeopardy! in 5 years 34.5M Jeopardy! Viewers 1.3B+ Impressions Over 10,000 Media Stories 11,000 attend watch events 2.5M+ Videos Views 12,582 Twitter tweets 25,763 Facebook Fans 21#IBMEDGE © 2012 IBM Corporation
  22. 22. Reflections on IBM Watson#IBMEDGE © 2012 IBM Corporation
  23. 23. How to Build your Own Watson Jr.In your Basement (eight steps) 1. Acquire Hardware 2. Establish Networking 3. Install Linux and Middleware 4. Download information sources Server 1: Server 2: Server 3: Presentation Business Logic File / Database Server Server Server This will be your This will be your This will be your avatar interface to compute node for information source ask questions and analytics repository display answers 23#IBMEDGE © 2012 IBM Corporation
  24. 24. How to Build your Own Watson Jr.In your Basement (eight steps) 5. Query Panel – Parsing the Question 6. Implement Hadoop, UIMA with Java, C++ and XML • This is the part we call in the industry a “Small Matter of Programming” (SMOP) 7. UIMA-AS Parallel Processing Element # of Cores Response Single core 1 2 hours 3 Quad-Core servers 12 10 minutes 1 Frame (Watson) 288 30 seconds IBM Watson 2880 3 seconds 8. Testing and Training 24#IBMEDGE © 2012 IBM Corporation
  25. 25. One Year Later –IBM Watson by the numbers 15 Million viewers watched the rerun of IBM Watson’s triumph on Jeopardy! 386 Universities are collaborating with IBM on Watson and Analytics 5 Members of U.S. Congress that have competed against Watson 77 Thousand downloads of “How to build your own Watson Jr. in your basement” 25#IBMEDGE © 2012 IBM Corporation
  26. 26. IBM Watson brings together a set oftransformational technologies todrive optimized outcomes Generates and Understands evaluates Natural hypothesis for Language better outcomes of human speech Adapts and …built on a massively parallel Learns from probabilistic evidence-based user selections architecture and responses 26#IBMEDGE © 2012 IBM Corporation
  27. 27. Watson was built for Jeopardy! Can be enhanced for future applications Early 2011 Future Watsons current capabilities were ...but future Watson enhancements areconstrained for Jeopardy! requirements... possible with further development...• English only • Multiple, varied users• A single questioner per system • More dynamic content updates instance • More/varied training data• 3-second response time • Varied response times• Static content • Additional languages• Unstructured text• Requires training data – history of questions and answers 27 #IBMEDGE © 2012 IBM Corporation27 8/2/2011
  28. 28. IBM PureSystems IBM Watson IBM PureFlex 10 Frames 4 Frames 60% Reduced Floor Space! 28#IBMEDGE © 2012 IBM Corporation
  29. 29. Transforming how Business Thinks, Acts, and Operates Healthcare Financial Services Diagnostic/treatment Investment and assistance, evidenced- retirement planning, based insights, institutional trading and collaborative medicine decision support Contact Center Government Call center and tech support Public safety, improved services, enterprise information sharing, knowledge management, security consumer insight IBM Watson has the capabilities to address grand business and societal challenges 29#IBMEDGE © 2012 IBM Corporation
  30. 30. Healthcare Industry is beset withcomplex information challenges Medical information is doubling every 5 years, much of which is unstructured 81% of physicians report spending 5 hours or less per month reading medical journals Medicine has become too complex (and only) about 20 percent of the knowledge clinicians use today is evidence-based.” --- Steven Shapiro, Chief Medical and Scientific Officer, UPMCSource: International Journal of Circumpolar Health,, Institute for Medicine" 30#IBMEDGE © 2012 IBM Corporation
  31. 31. Why is Watson Technologyideal for Healthcare? What condition has red eye, pain, What condition has red eye, pain, Interprets and understands inflammation, blurred vision, floating inflammation, blurred vision, floating natural language questions spots and sensitivity to light? spots and sensitivity to light? Physician Notes, Medical Journals, Physician Notes, Medical Journals, Analyzes large volumes Pathology results, Clinical Trials, Pathology results, Clinical Trials, of unstructured data Wikipedia, etc Wikipedia, etc Quantifies degrees of Uveitis Uveitis 91% 91% confidence in potential Iritis Iritis 48% 48% answers Keratitis Keratitis 29% 29% Supports iterative Family History, Physical Exam, Family History, Physical Exam, dialogue to refine results Current Medications, Current Medications, etc. etc. Adapts and learns to New Clinical Recommendations. New Clinical Recommendations. improve results over time New Drugs. Approved use of Drugs, New Drugs. Approved use of Drugs, etc. etc.Source: IBM Research, MI, SCIP, BCG analysis#IBMEDGE © 2012 IBM Corporation 31
  32. 32. IBM Watson for Healthcare Medical Professionals and Patients • Improve quality of care • Reduce errors • Engage patients Watson-enabled Solutions from major healthcare solution providers • Improve audit trails • Private data • Improve efficiency Customized Watson • Custom algorithms Solution Appliance • Better utilize skills • Custom applications • Advance Watson for Healthcare Service Evidenced-based Patient ... Differential ... Second ... Customized Care Workup Diagnosis Opinion Solution • Foster a healthcare • Right content, Watson Engine and Scoring and analytics right time Confidence Models Evidence Annotators ecosystem • Best-practices to Data Relationship Training and Data Data Algorithms • Capture value point of care • Capture value Healthcare Healthcare ... Benefits ... Benefits Public Public Publishers ... Publishers Providers Providers Providers Providers domain domain 32#IBMEDGE © 2012 IBM Corporation
  33. 33. Watson @ WellPoint • WellPoint & IBM agreement • WellPoint will develop and launch Waston-based solution –Help improve patient care –Provide delivery of up-to-date, evidence-based healthcare • IBM will develop the base Watson healthcare technology • Targeted implementation: – Physician Helper: Will be a consultant with the ability to analyze various inputs to help humans make a decision – Load Watson with medical data to analyze and identify conditions • Ranging from heart disease to cancer to diabetes. • Medical histories, test results, possible drug interactions and treatment options will be evaluated too. – Working with select physician groups in clinical spots 33#IBMEDGE © 2012 IBM Corporation
  34. 34. Seton Healthcare• First client to utilize IBM Content and Predictive Analytics for Healthcare – Combines IBMs Watson technology with industry solutions offering• Extract relevant clinical information from vast amounts of patient data – Better analyze the past – Understand the present – Predict future outcomes• Seton to focus on – Determine root causes of hospital re-admissions – Ways to decrease preventable multiple hospital visits• Facts… – One in five patients suffer from preventable re-admissions – Represents $17.4 billion of the current $102.6 billion Medicare budget * – Beginning in 2012, hospitals are penalized for high re-admission rates with reductions in Medicare discharge payments * According to the New England Journal of Medicine 34#IBMEDGE © 2012 IBM Corporation
  35. 35. IBM Content and PredictiveAnalytics for Healthcare: • First of a kind solution for healthcare and available to help healthcare organizations with transformation opportunities – Integrates structured and unstructured data – Applies predictive root cause analysis, natural language processing (like IBM Watson), and built-in medical terminology support – Identifies trends, patterns and deviations revealing clinical and operational insights. It is a first of a kind solution for healthcare and available to help healthcare organizations with transformation opportunities. • Pairs Content Analytics and SPSS Modeler Professional with solution and healthcare industry specific medical terminology support and services – Synergistic solution to IBM Watson – Healthcare solution that pairs natural language processing with predictive root cause analysis. 35#IBMEDGE © 2012 IBM Corporation
  36. 36. Financial services firms arebeset with complex informationchallenges Reuters publishes the equivalent of 9000 pages of financial news every day1 Five new research documents come out of Wall Street every minute1 Asset managers receive up to 1,000 e-mails daily1Sources: 1 - 2- IBM Client experience with ForEx traders 3 – Derived from NYSE data 36#IBMEDGE © 2012 IBM Corporation
  37. 37. Financial Services– Role baseduse case examples Utilize more information, more effectively and efficiently, to support investment and credit decisions Institutional Investment Advisor Provide investment advice directly to retail investors and/or support advisors for high net worth clients Retail Financial Advisor Leverage superior capabilities for fraud detection and/or Compliance support Risk Management officer 37#IBMEDGE © 2012 IBM Corporation
  38. 38. Citigroup • Announced plans to explore how Watson fits into the realm of digital banking • Citigroup will examine… – Watsons ability to “Help analyze customer needs” – Process vast amounts of up-to-the-minute financial, economic, product and client data, – Provide rapid, personalized banking solutions. – Watsons Deep-content analytics, Natural language processing and Evidence-based learning with how the company interacts with customers to advance digital banking • Watsons financial assistance will be provided as a “Cloud-based service" 38#IBMEDGE © 2012 IBM Corporation
  39. 39. Need to rethink what it will take toget ahead tomorrow Emerging IT • Structured & unstructured (global) Traditional IT • • Probabilistic Applications Discovery Oriented • Structured data (local) • Big Data Insights • Deterministic Applications • Natural Language • Search Oriented • Query Results • Machine Language 39#IBMEDGE © 2012 IBM Corporation
  40. 40. Summary• IBM Watson is... – A reasoning system that processes questions and answers in Natural Language, across structured and unstructured data sources, using Deep Analytics that learns for optimal results• IBM Watson is not… – An advanced search engine, database retrieval system, our new computer overlord, or the beginning of Skynet• IBM Watson works by... – Analyzing the question, generating hypotheses, evaluating evidence and presenting results scored by confidence level• Which means that… – Society now has a new tool to help tackle some of the biggest information challenges in healthcare, financial services, etc.#IBMEDGE © 2012 IBM Corporation
  41. 41. Thank You! Session: sBD07 Presenter: Tony Pearson#IBMEDGEIntel, the Intel logo, Xeon and Xeon Inside are trademarks or registeredtrademarks of Intel Corporation in the U.S. and /or other countries.
  42. 42. Learn more at: (Tweet hashtag #ibmwatson ) See Watson in action at an IBM Lab, Briefing Center or Analytics Solution Center#IBMEDGE © 2012 IBM Corporation
  43. 43. Tony Pearson 9000 S. Rita RoadAbout the Speaker Master Inventor, Bldg 9070 Mail 9070 Tucson, AZ 85744 Senior Managing Consultant Mr. Tony Pearson +1 520-799-4309 (Office) Master Inventor, IBM System Storage™ Senior Managing Consultant IBM System Storage Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products. Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1 most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through IV. Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.#IBMEDGE © 2012 IBM Corporation
  44. 44. Additional Resources Email: Twitter:Øtony Blog:Ø Books:Ø_tony IBM Expert Network:Øtony 44 #IBMEDGE © 2012 IBM Corporation44
  45. 45. Trademarks and disclaimers© IBM Corporation 2012. All rights reserved.Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or othercountries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of GovernmentCommerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks orregistered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States,other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITILis a registered trademark, and a registered community trademark of The Minister for the Cabinet Office, and is registered in the U.S. Patent and Trademark Office. UNIX is aregistered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks ofOracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other contries, or both and is used under licensetherefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.Other product and service names might be trademarks of IBM or other companies. Trademarks of International Business Machines Corporation in the United States, othercountries, or both can be found on the World Wide Web at is provided "AS IS" without warranty of any kind.The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actualenvironmental costs and performance characteristics may vary by customer.Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does notconstitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendorannouncements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims relatedto non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance,function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here tocommunicate IBMs current investment and development activities as a good faith effort to help with our customers future planning.Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user willexperience will vary depending upon considerations such as the amount of multiprogramming in the users job stream, the I/O configuration, the storage configuration, andthe workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios statedhere.Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact yourIBM representative or Business Partner for the most current pricing in your geography.Photographs shown may be engineering prototypes. Changes may be incorporated in production models.References in this document to IBM products or services do not imply that IBM intends to make them available in every country. #IBMEDGE © 2012 IBM Corporation