IBM Watson & Open Source Software - LinuxCon 2012

5,381 views

Published on

Presented at LinuxCon 2012 - San Diego, CA

Published in: Technology, News & Politics
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,381
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
159
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

IBM Watson & Open Source Software - LinuxCon 2012

  1. Watson & Open Source SoftwareIvan PortillaIT Architect8/29/12portilla@gmail.com
  2. If I have seen further it is by standing on the shoulders ofgiants. Isaac Newton, Letter to Robert Hooke, February 5, 1675
  3. Objectives By the end of this session, you should be able to: ü Describe the main characteristics of Watson QA system. ü Identify the key open source SW used in Watson. ü Recognize examples of Agile development best practices.3
  4. Disclaimers
  5. Disclaimer 1ü  This presentation represents the view of the author and does not represent the view of IBM.ü  All opinions expressed in this presentation are strictly of the speaker, and do NOT represent those of IBM, IBM management, or anyone else.ü  IBM and IBM (logo) are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries.
  6. Disclaimer  2   I  (We)  do  not  work  for  the  Watson  team.  
  7. Jeopardy! The game
  8. Game View
  9. Let’s Play JeopardyBEFORE & AFTER: The Jerry Maguire star whoautomatically maintains your vehicle’s speed.COMMON BONDS: trout, loose change in your pocket, andcompliments.Diplomatic Relations: Of the four countries in the worldthat the United States does not have diplomatic relationswith, the one that’s farthest north.Geography: Chile shares its longest land border with thiscountry
  10. Natural  Language  Processing   Understanding  natural  language  is  hard!  
  11. Watson  educa@on  
  12. A Brief History of Watson
  13. A Brief History of Watson§  Deep Blue Ended in 1997§  Looking for a new research challenge§  2004, IBM Research manager Charles Lickel, §  Ken Jennings§  Started in 2005 •  David Ferrucci•  DeepQA in 2007•  Won Jeopardy Match, Feb 2011
  14. 100% 90% 11/2010   80% 4/2010   70% 10/2009 5/2009   60% 12/2008  Precision 50% 8/2008   5/2008   40% 12/2007   30% 20% Baseline   10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered
  15. What  is  Watson?   ü Understands  natural   language.   ü Generates  &  evaluates   hypothesis  for  beQer   outcomes.   ü Adapts  &  learns  from   user  selec@ons  and   responses.    hQp://www.ibm.com/innova@on/us/watson/  
  16. Watson  metrics   Development Team: 25 people Project Duration: 4 years Software: 1,000,000 SLOC 700K Java, 300K C++ ~ 130 components Hardware: 90 IBM Power-750 servers 2880 Power7 cores @ 80+ TFLOPS 20 TB Disk, 16 TB RAM (memory) 10 Gbps network hQp://na11.apachecon.com/talks/19932    
  17. Open Source Software
  18. OSS  -­‐  Linux   hQp://video.linux.com/videos/linuxcon-­‐vancouver-­‐day-­‐2-­‐1/    
  19. Open  Source  too   hQp://ocw.mit.edu/index.htm   hQp://@.arc.nasa.gov/opensource/      
  20. How does it work?
  21. Learning To Rank (Basic Architecture)
  22. Watson  Architecture   hQp://en.wikipedia.org/wiki/Watson_(computer)    
  23. Who is the 44th President of theUnited States?
  24. Who is the 44th President of the United States?Who is the 44thPresident of the Ques@on  &  United States? Topic   Analysis   Watson  by  R.  Yates  
  25. Who is the 44th President of the United States? Who is the 44th President of the United States?Who is the 44th Presidentof the United States? Ques@on   Lexical Focus Keywords &  Topic   Answer Analysis   Type * Can be replaced by the correct answer to make a → Person true statement Watson  by  R.  Yates  
  26. Who is the 44th President of the United States?
  27. Who is the 44th President of the United States?
  28. Who is the 44th President of the United States? Primary Search Watson  by  R.  Yates  
  29. Who is the 44th President of the United States? Barack Obama George W. Bush Harvard Law School Illinois Primary Search Watson  by  R.  Yates  
  30. Who is the 44th President of the United States?Who is the 44th President of the United States?Barack ObamaWho is the 44th President of the United States?George W. BushWho is the 44th President of the United States?Harvard Law SchoolWho is the 44th President of the United States?Illinois Watson  by  R.  Yates  
  31. Who is the 44th President of the United States?Who is the 44th President of the United States?Barack ObamaWho is the 44th President of the United States?George W. BushWho is the 44th President of the United States?Harvard Law SchoolWho is the 44th President of the United States?Illinois Who is the 44th President of the United States? → Person Is Barack Obama a Person? .90 Is George W. Bush a Person? .90 Is Harvard Law School a Person? .10 Is Illinois a Person? .15 Watson  by  R.  Yates  
  32. Who is the 44th President of the United States?Barack Obama is the 44th President of the United StatesGeorge W. Bush is the 44th President of the United StatesHarvard Law School is the 44th President of the United StatesIllinois is the 44th President of the United States Watson  by  R.  Yates  
  33. Who is the 44th President of the United States? Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/; born August 4, 1961) is the 44th and current President of the United States. George Walker Bush (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009 and the 46th Governor of Texas from 1995 to 2000. Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Barack Obama .95 George W. Bush .80 Harvard Law School .05 Illinois.10 Watson  by  R.  Yates  
  34. Who is the 44th President of the United States?Candidate Answer Answer Evidence retrieval & Confidence Scoring scoringBarack Obama 0.90 0.90 .95George W. Bush 0.90 0.80 .65Harvard Law School 0.10 0.05 .05Illinois 0.15 0.10 Evidence .10 Retrieval Watson  by  R.  Yates  
  35. DeepQAMassively Parallel Probabilistic Evidence-Based Architecture Learned Models help combine and weigh the Evidence Evidence Balance Answer Sources & Combine Models Models SourcesQuestion Answer Evidence Models Models Scoring Retrieval Candidate Primary & Scoring Models Models Answer Search GenerationQues@on  &   Final  Confidence   Ques@on   Hypothesis   Hypothesis  and  Evidence    Topic   Synthesis Merging  &   Decomposi@on   Genera@on   Scoring  Analysis   Ranking   Hypothesis   Hypothesis and Merging & Answer & Evidence Scoring Ranking Genera@on   Confidence ... ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker  
  36. OSS in Watson 1.0
  37. OSS  in  Watson  ü UIMA,  UIMA-­‐AS  ü Hadoop,  Map  Reduce  ü Lucene,  Indri   ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker  
  38. UIMA   hQp://uima.apache.org/    
  39. UIMA-­‐Asynchronous  Scaleout   UIMA  AS  provides  more  flexible  and  powerful  scale  out   capability.   Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
  40. Think  Hadoop   A  framework  for  storing   &  processing  big  data.   ü 4,000  machines   ü 20  PB   High  reliability  done  in  sofware:   ü  Automated  failover  for  data  &   computa@on   ü  Implemented  in  Java   hQp://hadoop.apache.org/mapreduce/    
  41. Map  reduce   hQp://hadoop.apache.org/mapreduce/    
  42. UIMA  pipelines  in  Hadoop   Mul@ple  threads,  each  runing  a   UIMA  pipeline   Thread   Thread   Input   Mapper   .  .  .   Output   Thread   Thread   Thread   Reducer   Mapper   Thread   .  .  .   Shuffle/Sort   ~5000  “splits”   .   .   Thread   Thread   .   .   ~50  -­‐  100   .   .   “mappers”   ~400-­‐800    threads   .   .   Thread   .   Thread   Mapper   .  .  .   Thread   Reducer   Hadoop   Thread   Hadoop   Distributed   Thread   Distributed   File   Mapper   Thread   File   System   .  .  .   Thread   System   Thread   hQp://blogs.apache.org/founda@on/entry/apache_innova@on_bolsters_ibm_s  42  
  43. Architecture   hQp://lucene.apache.org    
  44. Indri  &Lemur  Indri  is  a  text  search  engine  developed  at  Umass  &  CMU.  Indri  is  part  of  the  Lemur  project.   indrid NetworkServerStub runquery LocalServer NetworkServerProxy NetworkServerProxy QueryEnvironment indrid #combine(#2(george bush).title) NetworkServerStub LocalServer LocalServer hQp://lemurproject.org/indri/    
  45. Watson  answers  in  2-­‐6  seconds  Ques@on   1000’s  of     100s  Possible   100,000’s  scores  from  many  simultaneous  Text   Pieces  of  Evidence   Analysis  Algorithms   100s    sources   Answers   Mul@ple   Interpreta@ons  Ques@on  &   Ques@on   Hypothesis   Hypothesis  and  Evidence     Final  Confidence   Synthesis  Topic  Analysis   Decomposi@on   Genera@on   Scoring   Merging  &  Ranking   Hypothesis   Hypothesis  and  Evidence   Genera@on   Scoring   Answer  &   Confidence   .  .  .   ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker   © 2011 IBM Corporation
  46. Other  OSS  in  Watson   hQps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/ ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en    
  47. J-­‐Archive  data   hQp://www.j-­‐archive.com/showgame.php?game_id=3577    
  48. Development Process ü War room setting with continuous collaboration. ü Weekly integration. ü Results driven with E2E regression testing. ü  About 8,000 experiments ü  10 GBs of test data/wk. ü  Agile development Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
  49. Related  Materials   hQp://www.apache.org     hQp://manning.com/   www.caltech.edu   hQp://oreilly.com/        
  50. Take  Away  OSS  is  powerful  and  scalable  enough  for  the  Watson  team,  what  about  your  project?  
  51. Resources  IBM  Journal  of  Research  and  Development  hQp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717    IBM  Watson  hQp://www.ibm.com/innova@on/us/watson/  hQp://www.research.ibm.com/deepqa/index.shtml    Nova  hQp://www.pbs.org/wgbh/nova/tech/smartest-­‐machine-­‐on-­‐earth.html  
  52. Review of Objectives Now that you have completed this session, you are able to: ü  Describe the main characteristics of Watson QA system. ü  Identify the key open source tools used in Watson. ü  Recognize examples of Agile development best practices.52
  53. …any final questions ?53

×