SlideShare a Scribd company logo
1 of 53
Download to read offline
Watson & Open Source Software




Ivan Portilla
IT Architect
8/29/12
portilla@gmail.com
If I have seen further it is by standing on the shoulders of
giants.




  Isaac Newton, Letter to Robert Hooke, February 5, 1675
Objectives
    By the end of this session, you should
       be able to:
    ü Describe the main characteristics of
       Watson QA system.
    ü Identify the key open source SW
       used in Watson.
    ü Recognize examples of Agile
       development best practices.



3
Disclaimers
Disclaimer 1
ü  This presentation represents the view of the author and
    does not represent the view of IBM.
ü  All opinions expressed in this presentation are strictly of
    the speaker, and do NOT represent those of IBM, IBM
    management, or anyone else.
ü  IBM and IBM (logo) are trademarks or registered
    trademarks of International Business Machines
    Corporation in the United States and/or other countries.
Disclaimer	
  2	
  
      I	
  (We)	
  do	
  not	
  work	
  for	
  the	
  Watson	
  team.	
  
Jeopardy! The game
Game View
Let’s Play Jeopardy
BEFORE & AFTER: The Jerry Maguire star who
automatically maintains your vehicle’s speed.

COMMON BONDS: trout, loose change in your pocket, and
compliments.

Diplomatic Relations: Of the four countries in the world
that the United States does not have diplomatic relations
with, the one that’s farthest north.


Geography: Chile shares its longest land border with this
country
Natural	
  Language	
  Processing	
  




        Understanding	
  natural	
  language	
  is	
  hard!	
  
Watson	
  educa@on	
  
A Brief History of Watson
A Brief History of Watson
§  Deep Blue Ended in 1997
§  Looking for a new research challenge
§  2004, IBM Research manager Charles Lickel,
      §  Ken Jennings
§  Started in 2005
     •  David Ferrucci
•  DeepQA in 2007
•  Won Jeopardy Match, Feb 2011
100%


            90%                                                                                                  11/2010	
  


            80%                                                                                                                4/2010	
  


            70%                               10/2009

                                                                 5/2009	
  
            60%
                                                                              12/2008	
  
Precision




            50%                                                                             8/2008	
  

                                                                                                           5/2008	
  
            40%                                    12/2007	
  


            30%


            20%

                       Baseline	
  
            10%


            0%
                  0%    10%           20%   30%     40%             50%           60%       70%          80%            90%         100%
                                                          % Answered
What	
  is	
  Watson?	
  
                               ü Understands	
  natural	
  
                                  language.	
  
                               ü Generates	
  &	
  evaluates	
  
                                  hypothesis	
  for	
  beQer	
  
                                  outcomes.	
  
                               ü Adapts	
  &	
  learns	
  from	
  
                                  user	
  selec@ons	
  and	
  
                                  responses.	
  


                            	
  hQp://www.ibm.com/innova@on/us/watson/	
  
Watson	
  metrics	
  
                        Development Team: 25 people
                        Project Duration:   4 years


                        Software: 1,000,000 SLOC
                           700K Java, 300K C++
                           ~ 130 components


                        Hardware: 90 IBM Power-750 servers
                        2880 Power7 cores @ 80+ TFLOPS
                        20 TB Disk, 16 TB RAM (memory)
                        10 Gbps network



                        hQp://na11.apachecon.com/talks/19932	
  
                        	
  
Open Source Software
OSS	
  -­‐	
  Linux	
  




                 hQp://video.linux.com/videos/linuxcon-­‐vancouver-­‐day-­‐2-­‐1/	
  
                 	
  
Open	
  Source	
  too	
  




   hQp://ocw.mit.edu/index.htm	
     hQp://@.arc.nasa.gov/opensource/	
  
   	
                                	
  
How does it work?
Learning To Rank (Basic Architecture)
Watson	
  Architecture	
  




                    hQp://en.wikipedia.org/wiki/Watson_(computer)	
  
                    	
  
Who is the 44th President of the
United States?
Who is the 44th President of the United States?


Who is the 44th
President of the Ques@on	
  &	
  
United States? Topic	
  
                 Analysis	
  




                                    Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?

                                 'Who' is the '44th' 'President' of the 'United States'?




Who is the 44th President
of the United States?



             Ques@on	
  
                              Lexical     Focus                Keywords
             &	
  Topic	
     Answer
             Analysis	
       Type        * Can be replaced
                                          by the correct
                                          answer to make a
                              → Person    true statement




                                                              Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?
Who is the 44th President of the United States?
Who is the 44th President of the United States?




     Primary
     Search




                                  Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?
                                  Barack Obama
                                  George W. Bush
                                  Harvard Law School
                                  Illinois


     Primary
     Search




                                  Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?

Who is the 44th President of the United States?
Barack Obama

Who is the 44th President of the United States?
George W. Bush

Who is the 44th President of the United States?
Harvard Law School

Who is the 44th President of the United States?
Illinois




                                                  Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?

Who is the 44th President of the United States?
Barack Obama

Who is the 44th President of the United States?
George W. Bush

Who is the 44th President of the United States?
Harvard Law School

Who is the 44th President of the United States?
Illinois
                                                  Who is the 44th President of the
                                                  United States?

                                                  → Person

                                                  Is Barack Obama a Person? .90
                                                  Is George W. Bush a Person? .90
                                                  Is Harvard Law School a Person? .10
                                                  Is Illinois a Person? .15




                                                          Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?




Barack Obama is the 44th President of the United States
George W. Bush is the 44th President of the United States
Harvard Law School is the 44th President of the United States
Illinois is the 44th President of the United States




                                                                Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?
                                                                 Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/;
                                                                 born August 4, 1961) is the 44th and current
                                                                 President of the United States.


                                                                 George Walker Bush (born July 6, 1946) is an American
                                                                 politician who served as the 43rd President of the United
                                                                 States from 2001 to 2009 and the 46th Governor of Texas
                                                                 from 1995 to 2000.




 Barack Obama is the 44th President of the United States
 George W. Bush is the 44th President of the United States
 Harvard Law School is the 44th President of the United States
 Illinois is the 44th President of the United States




                                                                       Barack Obama .95
                                                                       George W. Bush .80
                                                                       Harvard Law School .05
                                                                       Illinois.10


                                                                   Watson	
  by	
  R.	
  Yates	
  
Who is the 44th President of the United States?
Candidate Answer     Answer       Evidence retrieval &         Confidence
                        Scoring      scoring



Barack Obama         0.90         0.90                         .95
George W. Bush       0.90         0.80                         .65
Harvard Law School   0.10         0.05                         .05
Illinois             0.15         0.10             Evidence    .10
                                                   Retrieval




                                                   Watson	
  by	
  R.	
  Yates	
  
DeepQA
Massively Parallel Probabilistic Evidence-Based Architecture



                                                                                                                  Learned Models
                                                                                                                  help combine and
                                                                                                                  weigh the Evidence
                                                                       Evidence                       Balance
                          Answer                                       Sources                        & Combine     Models      Models
                          Sources
Question                                                     Answer                Evidence
                                                                                                                    Models      Models
                                                             Scoring               Retrieval
                               Candidate
                   Primary                                                         & Scoring                        Models      Models
                               Answer
                   Search
                               Generation


Ques@on	
  &	
                                                                                                      Final	
  Confidence	
  
                     Ques@on	
             Hypothesis	
      Hypothesis	
  and	
  Evidence	
  	
  
Topic	
                                                                                              Synthesis      Merging	
  &	
  
                     Decomposi@on	
        Genera@on	
       Scoring	
  
Analysis	
                                                                                                          Ranking	
  


                                    Hypothesis	
     Hypothesis and                    Merging &                      Answer &
                                                     Evidence Scoring                  Ranking
                                    Genera@on	
                                                                       Confidence
                                                ...
         ApacheCon	
  2011,	
  Watson,	
  a	
  Reasoning	
  System:	
  based	
  on	
  Apache	
  Inside!,	
  David	
  Boloker	
  
OSS in Watson 1.0
OSS	
  in	
  Watson	
  


ü UIMA,	
  UIMA-­‐AS	
  
ü Hadoop,	
  Map	
  Reduce	
  
ü Lucene,	
  Indri	
  




     ApacheCon	
  2011,	
  Watson,	
  a	
  Reasoning	
  System:	
  based	
  on	
  Apache	
  Inside!,	
  David	
  Boloker	
  
UIMA	
  




           hQp://uima.apache.org/	
  
           	
  
UIMA-­‐Asynchronous	
  Scaleout	
  
       UIMA	
  AS	
  provides	
  more	
  flexible	
  and	
  powerful	
  scale	
  out	
  
       capability.	
  




  Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
Think	
  Hadoop	
  

                                                  A	
  framework	
  for	
  storing	
  
                                                  &	
  processing	
  big	
  data.	
  
                                                       ü 4,000	
  machines	
  
                                                       ü 20	
  PB	
  




   High	
  reliability	
  done	
  in	
  sofware:	
  
   ü  Automated	
  failover	
  for	
  data	
  &	
  
       computa@on	
  
   ü  Implemented	
  in	
  Java	
  
                        hQp://hadoop.apache.org/mapreduce/	
  
                        	
  
Map	
  reduce	
  




                    hQp://hadoop.apache.org/mapreduce/	
  
                    	
  
UIMA	
  pipelines	
  in	
  Hadoop	
  
                                                                       Mul@ple	
  threads,	
  each	
  runing	
  a	
  
                                                                       UIMA	
  pipeline	
  
                                                                                         Thread	
  
                                                                                         Thread	
  

       Input	
                                             Mapper	
                        .	
  .	
  .	
                                        Output	
  
                                                                                         Thread	
  
                                                                                         Thread	
  


                                                                                         Thread	
                                 Reducer	
  
                                                           Mapper	
                      Thread	
  
                                                                                           .	
  .	
  .	
  




                                                                                                                 Shuffle/Sort	
  
                       ~5000	
  “splits”	
  
                             .	
                              .	
  
                                                                                         Thread	
  
                                                                                         Thread	
                                 .	
  
                             .	
               ~50	
  -­‐	
  100	
  
                                                              .	
                                                                 .	
  
                                               “mappers”	
                ~400-­‐800	
  	
  threads	
  
                             .	
                              .	
                       Thread	
  
                                                                                                                                  .	
  
                                                                                        Thread	
  

                                                           Mapper	
                       .	
  .	
  .	
  
                                                                                        Thread	
                                  Reducer	
  
     Hadoop	
                                                                           Thread	
  
                                                                                                                                                Hadoop	
  
     Distributed	
                                                                       Thread	
                                               Distributed	
  
     File	
                                                Mapper	
                      Thread	
  
                                                                                                                                                File	
  
     System	
                                                                             .	
  .	
  .	
  
                                                                                         Thread	
                                               System	
  
                                                                                         Thread	
  




             hQp://blogs.apache.org/founda@on/entry/apache_innova@on_bolsters_ibm_s	
  
42
             	
  
Architecture	
  




    hQp://lucene.apache.org	
  
    	
  
Indri	
  &Lemur	
  
Indri	
  is	
  a	
  text	
  search	
  engine	
  developed	
  at	
  Umass	
  &	
  CMU.	
  Indri	
  is	
  part	
  of	
  the	
  
Lemur	
  project.	
  
                                                                                                           indrid

                                                                                                    NetworkServerStub

                                     runquery
                                                                                                         LocalServer

                                                   NetworkServerProxy




                                                   NetworkServerProxy
          QueryEnvironment                                                                                 indrid
   #combine(#2(george bush).title)
                                                                                                    NetworkServerStub
                                                         LocalServer

                                                                                                         LocalServer




                                             hQp://lemurproject.org/indri/	
  
                                             	
  
Watson	
  answers	
  in	
  2-­‐6	
  seconds	
  



Ques@on	
                                                                                               1000’s	
  of	
  	
  
                                                               100s	
  Possible	
                                                        100,000’s	
  scores	
  from	
  many	
  simultaneous	
  Text	
  
                                                                                                        Pieces	
  of	
  Evidence	
       Analysis	
  Algorithms	
  
                                 100s	
  	
  sources	
         Answers	
  
            Mul@ple	
  
            Interpreta@ons	
  


Ques@on	
  &	
             Ques@on	
                             Hypothesis	
                           Hypothesis	
  and	
  Evidence	
  	
                                         Final	
  Confidence	
  
                                                                                                                                                      Synthesis	
  
Topic	
  Analysis	
        Decomposi@on	
                        Genera@on	
                            Scoring	
                                                                   Merging	
  &	
  Ranking	
  


                                                       Hypothesis	
                   Hypothesis	
  and	
  Evidence	
  
                                                       Genera@on	
                    Scoring	
                                                                                        Answer	
  &	
  
                                                                                                                                                                                       Confidence	
  
                                                                                      .	
  .	
  .	
  




              ApacheCon	
  2011,	
  Watson,	
  a	
  Reasoning	
  System:	
  based	
  on	
  Apache	
  Inside!,	
  David	
  Boloker	
  




                                                                                                                                                                                           © 2011 IBM Corporation
Other	
  OSS	
  in	
  Watson	
  




               hQps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/
               ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en	
  
               	
  
J-­‐Archive	
  data	
  




                 hQp://www.j-­‐archive.com/showgame.php?game_id=3577	
  
                 	
  
Development Process

                                         ü War room setting with
                                          continuous collaboration.
                                         ü Weekly integration.
                                         ü Results driven with E2E
                                          regression testing.


  ü  About 8,000 experiments
  ü  10 GBs of test data/wk.
  ü  Agile development



  Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
Related	
  Materials	
  




                                                  hQp://www.apache.org	
  
                                                  	
  




         hQp://manning.com/	
     www.caltech.edu	
     hQp://oreilly.com/	
  
         	
                       	
                    	
  
Take	
  Away	
  
OSS	
  is	
  powerful	
  and	
  scalable	
  enough	
  
for	
  the	
  Watson	
  team,	
  what	
  about	
  
your	
  project?	
  
Resources	
  
IBM	
  Journal	
  of	
  Research	
  and	
  Development	
  
hQp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717	
  
	
  
IBM	
  Watson	
  
hQp://www.ibm.com/innova@on/us/watson/	
  
hQp://www.research.ibm.com/deepqa/index.shtml	
  
	
  
Nova	
  
hQp://www.pbs.org/wgbh/nova/tech/smartest-­‐machine-­‐on-­‐earth.html	
  
Review of Objectives
     Now that you have completed this session, you are able to:

     ü  Describe the main characteristics of Watson QA system.
     ü  Identify the key open source tools used in Watson.
     ü  Recognize examples of Agile development best practices.




52
…any final
     questions ?




53

More Related Content

Viewers also liked

Datapalooza ibm 051916_final
Datapalooza ibm 051916_finalDatapalooza ibm 051916_final
Datapalooza ibm 051916_finaliportilla
 
Web components, so close!
Web components, so close!Web components, so close!
Web components, so close!Aleks Zinevych
 
Architecture of a Modern Web App - SpringOne India
Architecture of a Modern Web App - SpringOne IndiaArchitecture of a Modern Web App - SpringOne India
Architecture of a Modern Web App - SpringOne IndiaJeremy Grelle
 
World of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseWorld of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseKeith Redman
 
Modularizing RESTful Web Service Management with Aspect Oriented Programming
Modularizing RESTful Web Service Management with Aspect Oriented ProgrammingModularizing RESTful Web Service Management with Aspect Oriented Programming
Modularizing RESTful Web Service Management with Aspect Oriented ProgrammingWidhian Bramantya
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...
Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...
Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...Paul Fechtelkotter
 
Introduction To The IBM IoT Foundation
Introduction To The IBM IoT FoundationIntroduction To The IBM IoT Foundation
Introduction To The IBM IoT Foundationpetecrocker
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Ted Drake
 
왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능SeokWon Kim
 
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...Romeo Kienzler
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platformIBM Sverige
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
IBM Watson Innovation Day Boston
IBM Watson Innovation Day BostonIBM Watson Innovation Day Boston
IBM Watson Innovation Day BostonIBM Watson
 

Viewers also liked (18)

Datapalooza ibm 051916_final
Datapalooza ibm 051916_finalDatapalooza ibm 051916_final
Datapalooza ibm 051916_final
 
Web components, so close!
Web components, so close!Web components, so close!
Web components, so close!
 
Architecture of a Modern Web App - SpringOne India
Architecture of a Modern Web App - SpringOne IndiaArchitecture of a Modern Web App - SpringOne India
Architecture of a Modern Web App - SpringOne India
 
World of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseWorld of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics House
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Modularizing RESTful Web Service Management with Aspect Oriented Programming
Modularizing RESTful Web Service Management with Aspect Oriented ProgrammingModularizing RESTful Web Service Management with Aspect Oriented Programming
Modularizing RESTful Web Service Management with Aspect Oriented Programming
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Chrome extensions
Chrome extensionsChrome extensions
Chrome extensions
 
Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...
Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...
Smart Factory Technology Road Mapping Initiative_The Intent of Things and Ana...
 
Introduction To The IBM IoT Foundation
Introduction To The IBM IoT FoundationIntroduction To The IBM IoT Foundation
Introduction To The IBM IoT Foundation
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
 
Watson and Open Source Tools
Watson and Open Source ToolsWatson and Open Source Tools
Watson and Open Source Tools
 
왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능
 
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
 
IBM's watson
IBM's watsonIBM's watson
IBM's watson
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platform
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
IBM Watson Innovation Day Boston
IBM Watson Innovation Day BostonIBM Watson Innovation Day Boston
IBM Watson Innovation Day Boston
 

Recently uploaded

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 

Recently uploaded (20)

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 

IBM Watson & Open Source Software - LinuxCon 2012

  • 1. Watson & Open Source Software Ivan Portilla IT Architect 8/29/12 portilla@gmail.com
  • 2. If I have seen further it is by standing on the shoulders of giants. Isaac Newton, Letter to Robert Hooke, February 5, 1675
  • 3. Objectives By the end of this session, you should be able to: ü Describe the main characteristics of Watson QA system. ü Identify the key open source SW used in Watson. ü Recognize examples of Agile development best practices. 3
  • 5. Disclaimer 1 ü  This presentation represents the view of the author and does not represent the view of IBM. ü  All opinions expressed in this presentation are strictly of the speaker, and do NOT represent those of IBM, IBM management, or anyone else. ü  IBM and IBM (logo) are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries.
  • 6. Disclaimer  2   I  (We)  do  not  work  for  the  Watson  team.  
  • 9. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north. Geography: Chile shares its longest land border with this country
  • 10. Natural  Language  Processing   Understanding  natural  language  is  hard!  
  • 12. A Brief History of Watson
  • 13. A Brief History of Watson §  Deep Blue Ended in 1997 §  Looking for a new research challenge §  2004, IBM Research manager Charles Lickel, §  Ken Jennings §  Started in 2005 •  David Ferrucci •  DeepQA in 2007 •  Won Jeopardy Match, Feb 2011
  • 14. 100% 90% 11/2010   80% 4/2010   70% 10/2009 5/2009   60% 12/2008   Precision 50% 8/2008   5/2008   40% 12/2007   30% 20% Baseline   10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered
  • 15. What  is  Watson?   ü Understands  natural   language.   ü Generates  &  evaluates   hypothesis  for  beQer   outcomes.   ü Adapts  &  learns  from   user  selec@ons  and   responses.    hQp://www.ibm.com/innova@on/us/watson/  
  • 16. Watson  metrics   Development Team: 25 people Project Duration: 4 years Software: 1,000,000 SLOC 700K Java, 300K C++ ~ 130 components Hardware: 90 IBM Power-750 servers 2880 Power7 cores @ 80+ TFLOPS 20 TB Disk, 16 TB RAM (memory) 10 Gbps network hQp://na11.apachecon.com/talks/19932    
  • 18. OSS  -­‐  Linux   hQp://video.linux.com/videos/linuxcon-­‐vancouver-­‐day-­‐2-­‐1/    
  • 19. Open  Source  too   hQp://ocw.mit.edu/index.htm   hQp://@.arc.nasa.gov/opensource/      
  • 20. How does it work?
  • 21. Learning To Rank (Basic Architecture)
  • 22. Watson  Architecture   hQp://en.wikipedia.org/wiki/Watson_(computer)    
  • 23. Who is the 44th President of the United States?
  • 24. Who is the 44th President of the United States? Who is the 44th President of the Ques@on  &   United States? Topic   Analysis   Watson  by  R.  Yates  
  • 25. Who is the 44th President of the United States? 'Who' is the '44th' 'President' of the 'United States'? Who is the 44th President of the United States? Ques@on   Lexical Focus Keywords &  Topic   Answer Analysis   Type * Can be replaced by the correct answer to make a → Person true statement Watson  by  R.  Yates  
  • 26. Who is the 44th President of the United States?
  • 27. Who is the 44th President of the United States?
  • 28. Who is the 44th President of the United States? Primary Search Watson  by  R.  Yates  
  • 29. Who is the 44th President of the United States? Barack Obama George W. Bush Harvard Law School Illinois Primary Search Watson  by  R.  Yates  
  • 30. Who is the 44th President of the United States? Who is the 44th President of the United States? Barack Obama Who is the 44th President of the United States? George W. Bush Who is the 44th President of the United States? Harvard Law School Who is the 44th President of the United States? Illinois Watson  by  R.  Yates  
  • 31. Who is the 44th President of the United States? Who is the 44th President of the United States? Barack Obama Who is the 44th President of the United States? George W. Bush Who is the 44th President of the United States? Harvard Law School Who is the 44th President of the United States? Illinois Who is the 44th President of the United States? → Person Is Barack Obama a Person? .90 Is George W. Bush a Person? .90 Is Harvard Law School a Person? .10 Is Illinois a Person? .15 Watson  by  R.  Yates  
  • 32. Who is the 44th President of the United States? Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Watson  by  R.  Yates  
  • 33. Who is the 44th President of the United States? Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/; born August 4, 1961) is the 44th and current President of the United States. George Walker Bush (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009 and the 46th Governor of Texas from 1995 to 2000. Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Barack Obama .95 George W. Bush .80 Harvard Law School .05 Illinois.10 Watson  by  R.  Yates  
  • 34. Who is the 44th President of the United States? Candidate Answer Answer Evidence retrieval & Confidence Scoring scoring Barack Obama 0.90 0.90 .95 George W. Bush 0.90 0.80 .65 Harvard Law School 0.10 0.05 .05 Illinois 0.15 0.10 Evidence .10 Retrieval Watson  by  R.  Yates  
  • 35. DeepQA Massively Parallel Probabilistic Evidence-Based Architecture Learned Models help combine and weigh the Evidence Evidence Balance Answer Sources & Combine Models Models Sources Question Answer Evidence Models Models Scoring Retrieval Candidate Primary & Scoring Models Models Answer Search Generation Ques@on  &   Final  Confidence   Ques@on   Hypothesis   Hypothesis  and  Evidence     Topic   Synthesis Merging  &   Decomposi@on   Genera@on   Scoring   Analysis   Ranking   Hypothesis   Hypothesis and Merging & Answer & Evidence Scoring Ranking Genera@on   Confidence ... ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker  
  • 37. OSS  in  Watson   ü UIMA,  UIMA-­‐AS   ü Hadoop,  Map  Reduce   ü Lucene,  Indri   ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker  
  • 38. UIMA   hQp://uima.apache.org/    
  • 39. UIMA-­‐Asynchronous  Scaleout   UIMA  AS  provides  more  flexible  and  powerful  scale  out   capability.   Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
  • 40. Think  Hadoop   A  framework  for  storing   &  processing  big  data.   ü 4,000  machines   ü 20  PB   High  reliability  done  in  sofware:   ü  Automated  failover  for  data  &   computa@on   ü  Implemented  in  Java   hQp://hadoop.apache.org/mapreduce/    
  • 41. Map  reduce   hQp://hadoop.apache.org/mapreduce/    
  • 42. UIMA  pipelines  in  Hadoop   Mul@ple  threads,  each  runing  a   UIMA  pipeline   Thread   Thread   Input   Mapper   .  .  .   Output   Thread   Thread   Thread   Reducer   Mapper   Thread   .  .  .   Shuffle/Sort   ~5000  “splits”   .   .   Thread   Thread   .   .   ~50  -­‐  100   .   .   “mappers”   ~400-­‐800    threads   .   .   Thread   .   Thread   Mapper   .  .  .   Thread   Reducer   Hadoop   Thread   Hadoop   Distributed   Thread   Distributed   File   Mapper   Thread   File   System   .  .  .   Thread   System   Thread   hQp://blogs.apache.org/founda@on/entry/apache_innova@on_bolsters_ibm_s   42  
  • 43. Architecture   hQp://lucene.apache.org    
  • 44. Indri  &Lemur   Indri  is  a  text  search  engine  developed  at  Umass  &  CMU.  Indri  is  part  of  the   Lemur  project.   indrid NetworkServerStub runquery LocalServer NetworkServerProxy NetworkServerProxy QueryEnvironment indrid #combine(#2(george bush).title) NetworkServerStub LocalServer LocalServer hQp://lemurproject.org/indri/    
  • 45. Watson  answers  in  2-­‐6  seconds   Ques@on   1000’s  of     100s  Possible   100,000’s  scores  from  many  simultaneous  Text   Pieces  of  Evidence   Analysis  Algorithms   100s    sources   Answers   Mul@ple   Interpreta@ons   Ques@on  &   Ques@on   Hypothesis   Hypothesis  and  Evidence     Final  Confidence   Synthesis   Topic  Analysis   Decomposi@on   Genera@on   Scoring   Merging  &  Ranking   Hypothesis   Hypothesis  and  Evidence   Genera@on   Scoring   Answer  &   Confidence   .  .  .   ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker   © 2011 IBM Corporation
  • 46. Other  OSS  in  Watson   hQps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/ ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en    
  • 47. J-­‐Archive  data   hQp://www.j-­‐archive.com/showgame.php?game_id=3577    
  • 48. Development Process ü War room setting with continuous collaboration. ü Weekly integration. ü Results driven with E2E regression testing. ü  About 8,000 experiments ü  10 GBs of test data/wk. ü  Agile development Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
  • 49. Related  Materials   hQp://www.apache.org     hQp://manning.com/   www.caltech.edu   hQp://oreilly.com/        
  • 50. Take  Away   OSS  is  powerful  and  scalable  enough   for  the  Watson  team,  what  about   your  project?  
  • 51. Resources   IBM  Journal  of  Research  and  Development   hQp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717     IBM  Watson   hQp://www.ibm.com/innova@on/us/watson/   hQp://www.research.ibm.com/deepqa/index.shtml     Nova   hQp://www.pbs.org/wgbh/nova/tech/smartest-­‐machine-­‐on-­‐earth.html  
  • 52. Review of Objectives Now that you have completed this session, you are able to: ü  Describe the main characteristics of Watson QA system. ü  Identify the key open source tools used in Watson. ü  Recognize examples of Agile development best practices. 52
  • 53. …any final questions ? 53