• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Watson and Open Source Tools
 

Watson and Open Source Tools

on

  • 1,713 views

Presented at BJUG, 5/8/2012 by Ivan Portilla ...

Presented at BJUG, 5/8/2012 by Ivan Portilla

IBM Watson is a reasoning system with a question and answer front end that processes natural language coming from both structured and unstructured data. Watson additionally incorporates analytics from which the system learns to derive answer confidence and scoring. We will discuss the Watson System and some of its key foundations that came from the Open Source Apache Software Foundation. We will share the lessons learned of using Open source technologies including UIMA, Derby, Hadoop and Tomcat in Watson. We will explain how the primary (shallow) search was built with Apache Lucene and how the team followed Agile best practices for its Software development efforts.

Statistics

Views

Total Views
1,713
Views on SlideShare
1,637
Embed Views
76

Actions

Likes
3
Downloads
41
Comments
0

2 Embeds 76

http://www.interface.ru 74
https://translate.googleusercontent.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Watson and Open Source Tools Watson and Open Source Tools Presentation Transcript

    • Watson & Open Source Software Ivan Portilla IT Architect 5/8/12 portilla@gmail.comSunday, May 20, 12 1
    • If I have seen further it is by standing on the shoulders of giants. Isaac Newton, Letter to Robert Hooke, February 5, 1675Sunday, May 20, 12 2
    • Objectives By the end of this session, you should be able to: ü Describe the main characteristics of Watson QA system. ü Identify the key open source SW used in Watson. ü Recognize examples of Agile development best practices. 3Sunday, May 20, 12 3
    • DisclaimersSunday, May 20, 12 4
    • Disclaimer 1 ü This presentation represents the view of the author and does not represent the view of IBM. ü All opinions expressed in this presentation are strictly of the speaker, and do NOT represent those of IBM, IBM management, or anyone else. ü IBM and IBM (logo) are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries.Sunday, May 20, 12 5
    • Disclaimer  2 I  (We)  do  not  work  for  the  Watson  team.Sunday, May 20, 12 6
    • Let’s Play JeopardySunday, May 20, 12 7
    • Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed.Sunday, May 20, 12 7
    • Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed.Sunday, May 20, 12 7
    • Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments.Sunday, May 20, 12 7
    • Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north.Sunday, May 20, 12 7
    • Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north.Sunday, May 20, 12 7
    • Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north. Geography: Chile shares its longest land border with this countrySunday, May 20, 12 7
    • Natural  Language  Processing Understanding  natural  language  is  hard!Sunday, May 20, 12 8
    • Watson  educa@on hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717Sunday, May 20, 12 9
    • A Brief History of WatsonSunday, May 20, 12 10
    • A Brief History of Watson § Deep Blue Ended in 1997 § Looking for a new research challenge § 2004, IBM Research manager Charles Lickel, § Ken Jennings § Started in 2005 • David Ferrucci • DeepQA in 2007 • Won Jeopardy Match, Feb 2011 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717Sunday, May 20, 12 11
    • A Brief History of Watson § Deep Blue Ended in 1997 § Looking for a new research challenge § 2004, IBM Research manager Charles Lickel, § Ken Jennings § Started in 2005 • David Ferrucci • DeepQA in 2007 • Won Jeopardy Match, Feb 2011 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717Sunday, May 20, 12 11
    • 11/2010 4/2010 10/2009 5/2009 12/2008 Precision 8/2008 5/2008 12/2007 Baseline hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717Sunday, May 20, 12 12
    • What  is  Watson? ü Understands  natural   language. ü Generates  &  evaluates   hypothesis  for  beAer   outcomes. ü Adapts  &  learns  from   user  selec@ons  and   responses.  hAp://www.ibm.com/innova@on/us/watson/Sunday, May 20, 12 13
    • Watson  metrics Development Team: 25 people Project Duration: 4 years Hardware: 90 IBM Power-750 servers 2880 Power7 cores @ 80+ TFLOPS 20 TB Disk, 16 TB RAM (memory) 10 Gbps network hAp://na11.apachecon.com/talks/19932 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717Sunday, May 20, 12 14
    • Open Source SoftwareSunday, May 20, 12 15
    • OSS  -­‐  Linux hAp://video.linux.com/videos/linuxcon-­‐vancouver-­‐day-­‐2-­‐1/Sunday, May 20, 12 16
    • Open  Source  too hAp://ocw.mit.edu/index.htm hAp://@.arc.nasa.gov/opensource/Sunday, May 20, 12 17
    • How does it work?Sunday, May 20, 12 18
    • Learning To Rank (Basic Architecture) Hypothesis Keywords Evidence Scoring Watson  by  R.YatesSunday, May 20, 12 19
    • Watson  Architecture hAp://en.wikipedia.org/wiki/Watson_(computer)Sunday, May 20, 12 20
    • Who is the 44th President of the United States?Sunday, May 20, 12 21
    • Who is the 44th President of the United States? Who is the 44th President of the United States? Watson  by  R.YatesSunday, May 20, 12 22
    • Who is the 44th President of the United States? Who is the 44th President of the United States? Who is the 44th President of the United States? Lexical Focus Keywords Answer Type Can be replaced by the correct answer to make a → Person true statementSunday, May 20, 12 23
    • Who is the 44th President of the United States? Keywords: 44th President United States Question Question Hypothesis Analysis Generation Watson  by  R.YatesSunday, May 20, 12 24
    • Who is the 44th President of the United States? Keywords: 44th President United States Question Question Hypothesis Analysis GenerationSunday, May 20, 12 25
    • Who is the 44th President of the United States? Primary Search Question Question Hypothesis Analysis GenerationSunday, May 20, 12 26
    • Who is the 44th President of the United States? Barack Obama George W. Bush Harvard Law School Illinois Primary Search Question Question Hypothesis Analysis Generation Watson  by  R.YatesSunday, May 20, 12 27
    • Who is the 44th President of the United States? Who is the 44th President of the United States? Barack Obama Who is the 44th President of the United States? George W. Bush Answer Who is the 44th President of the United States? Scoring Harvard Law School Who is the 44th President of the United States? Illinois Question Scoring Question Hypothesis Scoring Analysis Generation ScoringSunday, May 20, 12 28
    • Who is the 44th President of the United States? Who is the 44th President of the United States? Barack Obama Who is the 44th President of the United States? Who is the 44th President of the United States? George W. Bush Answer Answer Scoring Contextual → Person Scoring Answer Answer Who is the 44th President of the United States? Answer Scoring Scoring Is Barack Obama a Person? .90 Harvard Law School Scoring Is George W. Bush a Person? .90 Is Harvard Law School a Person? .10 Who is the 44th President of the United States? Is Illinois a Person? .15 Illinois Question Scoring Question Hypothesis Scoring Analysis Generation Scoring Watson  by  R.YatesSunday, May 20, 12 29
    • Who is the 44th President of the United States? Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Contextual Answer Scoring Question Scoring Question Hypothesis Scoring Analysis Generation Scoring Unstructured  Informa@on  Management  Applica@ons  -­‐  UIMASunday, May 20, 12 30
    • Who is the 44th President of the United States? Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/; born August 4, 1961) is the 44th and current President of the United States. George Walker Bush (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009 and the 46th Governor of Texas from 1995 to 2000. Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Question Barack Obama .95 George W. Bush .80 Scoring Harvard Law School .05 Question Hypothesis Scoring Illinois.10 Analysis Generation Scoring Unstructured  Informa@on  Management  Applica@ons  -­‐  UIMA Watson  by  R.YatesSunday, May 20, 12 31
    • Who is the 44th President of the United States? Candidate Answer Answer Evidence retrieval & Confidence Scoring scoring Barack Obama 0.90 0.90 .95 George W. Bush 0.90 0.80 .65 Evidence Harvard Law School 0.10 0.05 Retrieval .05 Illinois 0.15 0.10 .10 Trained Question Models Scoring Question Hypothesis Scoring Analysis Generation Scoring Unstructured  Informa@on  Management  Applica@ons  -­‐  UIMA Watson  by  R.YatesSunday, May 20, 12 32
    • DeepQAMassively Parallel Probabilistic Evidence-Based Architecture Learned Models help combine and weigh the Evidence Evidence Balance Answer Sources & Combine Models Models SourcesQuestion Answer Evidence Models Models Scoring Retrieval Primary Candidate & Scoring Models Models Search Answer Generation Ques@on  &   Final  Confidence   Ques@on Hypothesis Hypothesis  and  Evidence     Topic   Synthesis Merging  &   Decomposi@on Genera@on Scoring Analysis Ranking Hypothesis Hypothesis and Merging & Answer & Evidence Scoring Ranking Genera@on Confidence ... ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  BolokerSunday, May 20, 12 33
    • OSS in Watson 1.0Sunday, May 20, 12 34
    • OSS  in  Watson üUIMA,  UIMA-­‐AS üHadoop,  Map  Reduce üLucene,  Indri ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  BolokerSunday, May 20, 12 35
    • UIMA hAp://uima.apache.org/Sunday, May 20, 12 36
    • UIMA-­‐Asynchronous  Scaleout UIMA  AS  provides  more  flexible  and  powerful  scale  out   capability. Innovate 2011, How Does It Work? The Architecture of Watson. Grady BoochSunday, May 20, 12 37
    • Think  Hadoop A  framework  for  storing   &  processing  big  data. üUp  to  4,000  machines üUp  to  20  PB High  reliability  done  in  soiware: üAutomated  failover  for  data  &   computa@on üImplemented  in  Java hAp://hadoop.apache.org/mapreduce/Sunday, May 20, 12 38
    • Map  reduce hAp://hadoop.apache.org/mapreduce/Sunday, May 20, 12 39
    • UIMA  pipelines  in  Hadoop Mul@ple  threads,  each  runing  a   UIMA  pipeline Thread Thread Input Mapper .  .  . Output Thread Thread Thread Reducer Mapper Thread .  .  . ~5000  “splits” Shuffle/Sort . . Thread Thread . . ~50  -­‐  100   . . “mappers” ~400-­‐800    threads . . Thread . Thread Mapper .  .  . Thread Reducer Hadoop Thread Hadoop Distributed Thread Distributed File Mapper Thread File .  .  . System Thread System Thread hAp://blogs.apache.org/founda@on/entry/apache_innova@on_bolsters_ibm_s41Sunday, May 20, 12 40
    • Architecture hAp://lucene.apache.orgSunday, May 20, 12 41
    • Indri  &Lemur Indri  is  a  text  search  engine  developed  at  Umass  &  CMU.  Indri  is  part  of  the   Lemur  project. hAp://lemurproject.org/indri/Sunday, May 20, 12 42
    • Watson  answers  in  2-­‐6  secondsQues@on 1000’s  of   100s  Possible   100,000’s  scores  from  many  simultaneous  Text   Pieces  of  Evidence Analysis  Algorithms 100s    sources Answers Mul@ple   Interpreta@onsQues@on  &   Ques@on Hypothesis Hypothesis  and  Evidence     Final  Confidence   SynthesisTopic  Analysis Decomposi@on Genera@on Scoring Merging  &  Ranking Hypothesis Hypothesis  and  Evidence   Genera@on Scoring Answer  &   Confidence .  .  . ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker © 2011 IBM CorporationSunday, May 20, 12 43
    • Other  OSS  in  Watson hAps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/ ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=enSunday, May 20, 12 44
    • J-­‐Archive  data hAp://www.j-­‐archive.com/showgame.php?game_id=3577Sunday, May 20, 12 45
    • Development Process üWar room setting with continuous collaboration. üWeekly integration. üResults driven with E2E regression testing. ü About 8,000 experiments ü 10 GBs of test data/wk. ü Agile development Innovate 2011, How Does It Work? The Architecture of Watson. Grady BoochSunday, May 20, 12 46
    • Other  OSS hAp://manning.com/ hAp://www.apache.orgSunday, May 20, 12 47
    • Take  Away OSS  is  powerful  and  scalable  enough   for  the  Watson  team,  what  about   your  project?Sunday, May 20, 12 48
    • Resources IBM  Journal  of  Research  and  Development hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 IBM  Watson hAp://www.ibm.com/innova@on/us/watson/ hAp://www.research.ibm.com/deepqa/index.shtml Nova hAp://www.pbs.org/wgbh/nova/tech/smartest-­‐machine-­‐on-­‐earth.htmlSunday, May 20, 12 49
    • Review of Objectives Now that you have completed this session, you are able to: ü Describe the main characteristics of Watson QA system. ü Identify the key open source tools used in Watson. ü Recognize examples of Agile development best practices. 51Sunday, May 20, 12 50
    • …any final questions ? 52Sunday, May 20, 12 51