SlideShare a Scribd company logo
Watson & Open Source
   Software




     Ivan Portilla
     IT Architect
     5/8/12
     portilla@gmail.com


Sunday, May 20, 12        1
If I have seen further it is by standing on the
        shoulders of giants.




            Isaac Newton, Letter to Robert Hooke, February 5, 1675


Sunday, May 20, 12                                                   2
Objectives
        By the end of this session, you
           should be able to:
        ü Describe the main characteristics
           of Watson QA system.
        ü Identify the key open source SW
           used in Watson.
        ü Recognize examples of Agile
           development best practices.



    3


Sunday, May 20, 12                             3
Disclaimers




Sunday, May 20, 12   4
Disclaimer 1
        ü This presentation represents the view of the
           author and does not represent the view of IBM.
        ü All opinions expressed in this presentation are
           strictly of the speaker, and do NOT represent
           those of IBM, IBM management, or anyone else.
        ü IBM and IBM (logo) are trademarks or registered
           trademarks of International Business Machines
           Corporation in the United States and/or other
           countries.




Sunday, May 20, 12                                           5
Disclaimer	
  2
                I	
  (We)	
  do	
  not	
  work	
  for	
  the	
  Watson	
  team.




Sunday, May 20, 12                                                                6
Let’s Play Jeopardy




Sunday, May 20, 12           7
Let’s Play Jeopardy
      BEFORE & AFTER: The Jerry Maguire star who
      automatically maintains your vehicle’s speed.




Sunday, May 20, 12                                    7
Let’s Play Jeopardy
      BEFORE & AFTER: The Jerry Maguire star who
      automatically maintains your vehicle’s speed.




Sunday, May 20, 12                                    7
Let’s Play Jeopardy
      BEFORE & AFTER: The Jerry Maguire star who
      automatically maintains your vehicle’s speed.

      COMMON BONDS: trout, loose change in your pocket,
      and compliments.




Sunday, May 20, 12                                        7
Let’s Play Jeopardy
      BEFORE & AFTER: The Jerry Maguire star who
      automatically maintains your vehicle’s speed.

      COMMON BONDS: trout, loose change in your pocket,
      and compliments.

      Diplomatic Relations: Of the four countries in the
      world that the United States does not have diplomatic
      relations with, the one that’s farthest north.




Sunday, May 20, 12                                            7
Let’s Play Jeopardy
      BEFORE & AFTER: The Jerry Maguire star who
      automatically maintains your vehicle’s speed.

      COMMON BONDS: trout, loose change in your pocket,
      and compliments.

      Diplomatic Relations: Of the four countries in the
      world that the United States does not have diplomatic
      relations with, the one that’s farthest north.




Sunday, May 20, 12                                            7
Let’s Play Jeopardy
      BEFORE & AFTER: The Jerry Maguire star who
      automatically maintains your vehicle’s speed.

      COMMON BONDS: trout, loose change in your pocket,
      and compliments.

      Diplomatic Relations: Of the four countries in the
      world that the United States does not have diplomatic
      relations with, the one that’s farthest north.

      Geography: Chile shares its longest land border with
      this country


Sunday, May 20, 12                                            7
Natural	
  Language	
  Processing




                 Understanding	
  natural	
  language	
  is	
  hard!
Sunday, May 20, 12                                                     8
Watson	
  educa@on




                 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717

Sunday, May 20, 12                                                              9
A Brief History of Watson




Sunday, May 20, 12                        10
A Brief History of Watson
          § Deep Blue Ended in 1997
          § Looking for a new research challenge
          § 2004, IBM Research manager Charles Lickel,
               § Ken Jennings
          § Started in 2005
              • David Ferrucci
          • DeepQA in 2007
          • Won Jeopardy Match, Feb 2011




                     hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717


Sunday, May 20, 12                                                                  11
A Brief History of Watson
          § Deep Blue Ended in 1997
          § Looking for a new research challenge
          § 2004, IBM Research manager Charles Lickel,
               § Ken Jennings
          § Started in 2005
              • David Ferrucci
          • DeepQA in 2007
          • Won Jeopardy Match, Feb 2011




                     hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717


Sunday, May 20, 12                                                                  11
11/2010


                                                                                           4/2010


                                  10/2009

                                                 5/2009

                                                          12/2008
 Precision




                                                                    8/2008

                                                                             5/2008
                                       12/2007




                     Baseline




             hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717
Sunday, May 20, 12                                                                                  12
What	
  is	
  Watson?
                                 ü Understands	
  natural	
  
                                    language.
                                 ü Generates	
  &	
  evaluates	
  
                                    hypothesis	
  for	
  beAer	
  
                                    outcomes.
                                 ü Adapts	
  &	
  learns	
  from	
  
                                    user	
  selec@ons	
  and	
  
                                    responses.


                              	
  hAp://www.ibm.com/innova@on/us/watson/

Sunday, May 20, 12                                                         13
Watson	
  metrics
                                      Development Team: 25 people
                                      Project Duration:   4 years




                                      Hardware: 90 IBM Power-750 servers
                                      2880 Power7 cores @ 80+ TFLOPS
                                      20 TB Disk, 16 TB RAM (memory)
                                      10 Gbps network

                                       hAp://na11.apachecon.com/talks/19932




            hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717
Sunday, May 20, 12                                                            14
Open Source Software




Sunday, May 20, 12                   15
OSS	
  -­‐	
  Linux




                     hAp://video.linux.com/videos/linuxcon-­‐vancouver-­‐day-­‐2-­‐1/

Sunday, May 20, 12                                                                      16
Open	
  Source	
  too




         hAp://ocw.mit.edu/index.htm   hAp://@.arc.nasa.gov/opensource/



Sunday, May 20, 12                                                        17
How does it work?




Sunday, May 20, 12                18
Learning To Rank (Basic Architecture)




                                Hypothesis
      Keywords                   Evidence
                                  Scoring




                                             Watson	
  by	
  R.Yates
Sunday, May 20, 12                                                     19
Watson	
  Architecture




                        hAp://en.wikipedia.org/wiki/Watson_(computer)

Sunday, May 20, 12                                                      20
Who is the 44th President of the
              United States?




Sunday, May 20, 12                               21
Who is the 44th President of the United States?




   Who is the 44th President
   of the United States?




                                Watson	
  by	
  R.Yates
Sunday, May 20, 12                                        22
Who is the 44th President of the United States?




                               	
   'Who' is the '44th' 'President' of the 'United States'?




   Who is the 44th President
   of the United States?



                                Lexical       Focus              Keywords
                                Answer
                                Type          Can be replaced
                                              by the correct
                                              answer to make a
                                → Person
                                              true statement



Sunday, May 20, 12                                                                            23
Who is the 44th President of the United States?


   Keywords:
   44th President United States




       Question



                     Question   Hypothesis
                     Analysis   Generation




                                             Watson	
  by	
  R.Yates
Sunday, May 20, 12                                                     24
Who is the 44th President of the United States?


   Keywords:
   44th President United States




       Question



                     Question   Hypothesis
                     Analysis   Generation




Sunday, May 20, 12                                     25
Who is the 44th President of the United States?




                     Primary
                     Search




       Question



                     Question   Hypothesis
                     Analysis   Generation




Sunday, May 20, 12                                     26
Who is the 44th President of the United States?

                                             Barack Obama
                                             George W. Bush
                                             Harvard Law School
                                             Illinois



                     Primary
                     Search




       Question



                     Question   Hypothesis
                     Analysis   Generation




                                              Watson	
  by	
  R.Yates
Sunday, May 20, 12                                                      27
Who is the 44th President of the United States?


 Who is the 44th President of the United States?
 Barack Obama


 Who is the 44th President of the United States?
 George W. Bush

                                                        Answer
 Who is the 44th President of the United States?        Scoring
 Harvard Law School


 Who is the 44th President of the United States?
 Illinois Question

                                                   Scoring
                       Question     Hypothesis      Scoring
                       Analysis     Generation
                                                     Scoring




Sunday, May 20, 12                                                28
Who is the 44th President of the United States?


 Who is the 44th President of the United States?
 Barack Obama
                                                                                  Who is the 44th President of the United
                                                                                  States?
 Who is the 44th President of the United States?
 George W. Bush                                    Answer
                                                     Answer
                                                   Scoring       Contextual       → Person
                                                     Scoring
                                                         Answer Answer
 Who is the 44th President of the United States?           Answer
                                                         Scoring Scoring          Is Barack Obama a Person? .90
 Harvard Law School                                        Scoring                Is George W. Bush a Person? .90
                                                                                  Is Harvard Law School a Person? .10
 Who is the 44th President of the United States?                                  Is Illinois a Person? .15
 Illinois Question

                                                   Scoring
                       Question     Hypothesis      Scoring
                       Analysis     Generation
                                                     Scoring




                                                                         Watson	
  by	
  R.Yates
Sunday, May 20, 12                                                                                                          29
Who is the 44th President of the United States?




         Barack Obama is the 44th President of the United States
         George W. Bush is the 44th President of the United States
         Harvard Law School is the 44th President of the United States
         Illinois is the 44th President of the United States
                                                                         Contextual
                                                                          Answer
                                                                          Scoring


       Question

                                                    Scoring
                       Question   Hypothesis         Scoring
                       Analysis   Generation
                                                      Scoring




                     Unstructured	
  Informa@on	
  Management	
  Applica@ons	
  -­‐	
  UIMA



Sunday, May 20, 12                                                                            30
Who is the 44th President of the United States?
                                                                              Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/;
                                                                              born August 4, 1961) is the 44th and current President of
                                                                              the United States.

                                                                              George Walker Bush (born July 6, 1946) is an American
                                                                              politician who served as the 43rd President of the United
                                                                              States from 2001 to 2009 and the 46th Governor of Texas
                                                                              from 1995 to 2000.




 Barack Obama is the 44th President of the United States
 George W. Bush is the 44th President of the United States
 Harvard Law School is the 44th President of the United States
 Illinois is the 44th President of the United States



        Question                                                                Barack Obama .95
                                                                                George W. Bush .80
                                                      Scoring                   Harvard Law School .05
                        Question    Hypothesis         Scoring                  Illinois.10
                        Analysis    Generation
                                                         Scoring




                     Unstructured	
  Informa@on	
  Management	
  Applica@ons	
  -­‐	
  UIMA

                                                                    Watson	
  by	
  R.Yates
Sunday, May 20, 12                                                                                                                  31
Who is the 44th President of the United States?
       Candidate Answer                   Answer             Evidence retrieval &             Confidence
                                             Scoring            scoring



       Barack Obama                       0.90               0.90                             .95
       George W. Bush                     0.90               0.80                             .65
                                                                                  Evidence
       Harvard Law School                 0.10               0.05                 Retrieval   .05
       Illinois                           0.15               0.10                             .10
                                                                                                      Trained
       Question                                                                                       Models

                                                 Scoring
                       Question   Hypothesis      Scoring
                       Analysis   Generation
                                                   Scoring




                     Unstructured	
  Informa@on	
  Management	
  Applica@ons	
  -­‐	
  UIMA

                                                                    Watson	
  by	
  R.Yates
Sunday, May 20, 12                                                                                              32
DeepQA
Massively Parallel Probabilistic Evidence-Based Architecture



                                                                                                               Learned Models
                                                                                                               help combine and
                                                                                                               weigh the Evidence
                                                                    Evidence                       Balance
                          Answer                                    Sources                        & Combine
                                                                                                                 Models      Models
                          Sources
Question                                                  Answer                Evidence
                                                                                                                 Models      Models
                                                          Scoring               Retrieval
                    Primary   Candidate
                                                                                & Scoring                        Models      Models
                    Search    Answer
                              Generation


 Ques@on	
  &	
                                                                                                  Final	
  Confidence	
  
                      Ques@on            Hypothesis       Hypothesis	
  and	
  Evidence	
  	
  
 Topic	
                                                                                          Synthesis      Merging	
  &	
  
                      Decomposi@on       Genera@on        Scoring
 Analysis                                                                                                        Ranking


                                    Hypothesis    Hypothesis and                    Merging &                      Answer &
                                                  Evidence Scoring                  Ranking
                                    Genera@on                                                                      Confidence
                                             ...
          ApacheCon	
  2011,	
  Watson,	
  a	
  Reasoning	
  System:	
  based	
  on	
  Apache	
  Inside!,	
  David	
  Boloker

Sunday, May 20, 12                                                                                                                        33
OSS in Watson 1.0




Sunday, May 20, 12                34
OSS	
  in	
  Watson


   üUIMA,	
  UIMA-­‐AS
   üHadoop,	
  Map	
  Reduce
   üLucene,	
  Indri




         ApacheCon	
  2011,	
  Watson,	
  a	
  Reasoning	
  System:	
  based	
  on	
  Apache	
  Inside!,	
  David	
  Boloker


Sunday, May 20, 12                                                                                                        35
UIMA




                            hAp://uima.apache.org/

Sunday, May 20, 12                                   36
UIMA-­‐Asynchronous	
  Scaleout
                 UIMA	
  AS	
  provides	
  more	
  flexible	
  and	
  powerful	
  scale	
  out	
  
                 capability.




         Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch


Sunday, May 20, 12                                                                                  37
Think	
  Hadoop

                                                     A	
  framework	
  for	
  storing	
  
                                                     &	
  processing	
  big	
  data.
                                                         üUp	
  to	
  4,000	
  machines
                                                         üUp	
  to	
  20	
  PB




       High	
  reliability	
  done	
  in	
  soiware:
       üAutomated	
  failover	
  for	
  data	
  &	
  
       computa@on
       üImplemented	
  in	
  Java
                            hAp://hadoop.apache.org/mapreduce/

Sunday, May 20, 12                                                                         38
Map	
  reduce




                     hAp://hadoop.apache.org/mapreduce/

Sunday, May 20, 12                                        39
UIMA	
  pipelines	
  in	
  Hadoop
                                                                 Mul@ple	
  threads,	
  each	
  runing	
  a	
  
                                                                 UIMA	
  pipeline
                                                                                   Thread
                                                                                   Thread

       Input                                          Mapper                         .	
  .	
  .                                 Output
                                                                                   Thread
                                                                                   Thread


                                                                                   Thread                              Reducer
                                                      Mapper                       Thread
                                                                                     .	
  .	
  .
                     ~5000	
  “splits”




                                                                                                          Shuffle/Sort
                          .                             .
                                                                                   Thread
                                                                                   Thread                              .
                          .
                                         ~50	
  -­‐	
  100	
  
                                                        .                                                              .
                                         “mappers”                  ~400-­‐800	
  	
  threads
                          .                             .                         Thread
                                                                                                                       .
                                                                                  Thread
                                                      Mapper                        .	
  .	
  .
                                                                                  Thread                               Reducer
     Hadoop                                                                       Thread
                                                                                                                                 Hadoop
     Distributed                                                                   Thread                                        Distributed
     File                                             Mapper                       Thread
                                                                                                                                 File
                                                                                    .	
  .	
  .
     System                                                                        Thread                                        System
                                                                                   Thread




           hAp://blogs.apache.org/founda@on/entry/apache_innova@on_bolsters_ibm_s
41

Sunday, May 20, 12                                                                                                                        40
Architecture




                        hAp://lucene.apache.org

Sunday, May 20, 12                                41
Indri	
  &Lemur
      Indri	
  is	
  a	
  text	
  search	
  engine	
  developed	
  at	
  Umass	
  &	
  CMU.	
  Indri	
  is	
  part	
  of	
  the	
  
      Lemur	
  project.




                                                   hAp://lemurproject.org/indri/

Sunday, May 20, 12                                                                                                                    42
Watson	
  answers	
  in	
  2-­‐6	
  seconds



Ques@on                                                                                     1000’s	
  of	
  
                                                       100s	
  Possible	
                                                    100,000’s	
  scores	
  from	
  many	
  simultaneous	
  Text	
  
                                                                                            Pieces	
  of	
  Evidence         Analysis	
  Algorithms
                           100s	
  	
  sources         Answers
          Mul@ple	
  
          Interpreta@ons


Ques@on	
  &	
        Ques@on                            Hypothesis                         Hypothesis	
  and	
  Evidence	
  	
                                         Final	
  Confidence	
  
                                                                                                                                           Synthesis
Topic	
  Analysis     Decomposi@on                       Genera@on                          Scoring                                                                     Merging	
  &	
  Ranking


                                                 Hypothesis                   Hypothesis	
  and	
  Evidence	
  
                                                 Genera@on                    Scoring                                                                                      Answer	
  &	
  
                                                                                                                                                                           Confidence
                                                                              .	
  .	
  .




            ApacheCon	
  2011,	
  Watson,	
  a	
  Reasoning	
  System:	
  based	
  on	
  Apache	
  Inside!,	
  David	
  Boloker




                                                                                                                                                                                © 2011 IBM Corporation



Sunday, May 20, 12                                                                                                                                                                                  43
Other	
  OSS	
  in	
  Watson




                     hAps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/
                     ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en


Sunday, May 20, 12                                                                                         44
J-­‐Archive	
  data




                     hAp://www.j-­‐archive.com/showgame.php?game_id=3577
Sunday, May 20, 12                                                         45
Development Process

                                                 üWar room setting with
                                                  continuous collaboration.
                                                 üWeekly integration.
                                                 üResults driven with E2E
                                                  regression testing.


            ü About 8,000 experiments
            ü 10 GBs of test data/wk.
            ü Agile development



          Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch

Sunday, May 20, 12                                                                   46
Other	
  OSS




              hAp://manning.com/   hAp://www.apache.org

Sunday, May 20, 12                                        47
Take	
  Away
       OSS	
  is	
  powerful	
  and	
  scalable	
  enough	
  
       for	
  the	
  Watson	
  team,	
  what	
  about	
  
       your	
  project?




Sunday, May 20, 12                                              48
Resources
        IBM	
  Journal	
  of	
  Research	
  and	
  Development
        hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717
        IBM	
  Watson
        hAp://www.ibm.com/innova@on/us/watson/
        hAp://www.research.ibm.com/deepqa/index.shtml

        Nova
        hAp://www.pbs.org/wgbh/nova/tech/smartest-­‐machine-­‐on-­‐earth.html




Sunday, May 20, 12                                                              49
Review of Objectives
         Now that you have completed this session, you are able to:

         ü Describe the main characteristics of Watson QA system.
         ü Identify the key open source tools used in Watson.
         ü Recognize examples of Agile development best practices.




    51


Sunday, May 20, 12                                                    50
…any final
                     questions ?




    52


Sunday, May 20, 12                 51

More Related Content

Similar to Watson and Open Source Tools

Scaling Quizlet
Scaling QuizletScaling Quizlet
Scaling Quizlet
Quizlet
 
99 inception-deck
99 inception-deck99 inception-deck
99 inception-deck
drewz lin
 
Developer Tools State of the Union
Developer Tools State of the UnionDeveloper Tools State of the Union
Developer Tools State of the Union
Atlassian
 
Atlassian - A Different Kind Of Software Company
Atlassian - A Different Kind Of Software CompanyAtlassian - A Different Kind Of Software Company
Atlassian - A Different Kind Of Software Company
Mike Cannon-Brookes
 
Innovate, Learn, Deliver: Staying ahead in turbulent times
Innovate, Learn, Deliver: Staying ahead in turbulent timesInnovate, Learn, Deliver: Staying ahead in turbulent times
Innovate, Learn, Deliver: Staying ahead in turbulent times
Scott Shaw
 
QTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian Summit
QTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian SummitQTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian Summit
QTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian Summit
Amazon Web Services
 
Fast Mobile UIs
Fast Mobile UIsFast Mobile UIs
Fast Mobile UIs
Wooga
 
Technology largely unseen - but on the radar
Technology largely unseen - but on the radarTechnology largely unseen - but on the radar
Technology largely unseen - but on the radar
University of Hertfordshire
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
Wayne Ahlquist
 
13 0212 toccon - carpenter altmetrics 2
13 0212 toccon - carpenter altmetrics 213 0212 toccon - carpenter altmetrics 2
13 0212 toccon - carpenter altmetrics 2
National Information Standards Organization (NISO)
 
iFixit @ Monterey Bay Aquarium
iFixit @ Monterey Bay AquariumiFixit @ Monterey Bay Aquarium
iFixit @ Monterey Bay Aquarium
Kyle Wiens
 
Minixsmp
MinixsmpMinixsmp
Minixsmp
saleemrahim
 
Sample Comic Scene
Sample Comic SceneSample Comic Scene
Sample Comic Scene
Daniel Shepard
 
Retrospective about SBML on the occasion of the 10th Anniversary of SBML
Retrospective about SBML on the occasion of the 10th Anniversary of SBMLRetrospective about SBML on the occasion of the 10th Anniversary of SBML
Retrospective about SBML on the occasion of the 10th Anniversary of SBML
Mike Hucka
 
Metrics driven engineering (velocity 2011)
Metrics driven engineering (velocity 2011)Metrics driven engineering (velocity 2011)
Metrics driven engineering (velocity 2011)
Kellan
 
Front end performance improvements
Front end performance improvementsFront end performance improvements
Front end performance improvements
Matthew Farina
 
Winning in a fast changing world
Winning in a fast changing worldWinning in a fast changing world
Winning in a fast changing world
University of Hertfordshire
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
Theo Schlossnagle
 
Usability: Test Types & Ethics
Usability: Test Types & EthicsUsability: Test Types & Ethics
Usability: Test Types & Ethics
Krista Kennedy
 
Educause - Building a Responsive Website for the Presidential Debate
Educause - Building a Responsive Website for the Presidential DebateEducause - Building a Responsive Website for the Presidential Debate
Educause - Building a Responsive Website for the Presidential Debate
Jon Liu
 

Similar to Watson and Open Source Tools (20)

Scaling Quizlet
Scaling QuizletScaling Quizlet
Scaling Quizlet
 
99 inception-deck
99 inception-deck99 inception-deck
99 inception-deck
 
Developer Tools State of the Union
Developer Tools State of the UnionDeveloper Tools State of the Union
Developer Tools State of the Union
 
Atlassian - A Different Kind Of Software Company
Atlassian - A Different Kind Of Software CompanyAtlassian - A Different Kind Of Software Company
Atlassian - A Different Kind Of Software Company
 
Innovate, Learn, Deliver: Staying ahead in turbulent times
Innovate, Learn, Deliver: Staying ahead in turbulent timesInnovate, Learn, Deliver: Staying ahead in turbulent times
Innovate, Learn, Deliver: Staying ahead in turbulent times
 
QTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian Summit
QTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian SummitQTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian Summit
QTB Innovate, Learn, Deliver - Thoughtworks - - AWS Australian Summit
 
Fast Mobile UIs
Fast Mobile UIsFast Mobile UIs
Fast Mobile UIs
 
Technology largely unseen - but on the radar
Technology largely unseen - but on the radarTechnology largely unseen - but on the radar
Technology largely unseen - but on the radar
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
 
13 0212 toccon - carpenter altmetrics 2
13 0212 toccon - carpenter altmetrics 213 0212 toccon - carpenter altmetrics 2
13 0212 toccon - carpenter altmetrics 2
 
iFixit @ Monterey Bay Aquarium
iFixit @ Monterey Bay AquariumiFixit @ Monterey Bay Aquarium
iFixit @ Monterey Bay Aquarium
 
Minixsmp
MinixsmpMinixsmp
Minixsmp
 
Sample Comic Scene
Sample Comic SceneSample Comic Scene
Sample Comic Scene
 
Retrospective about SBML on the occasion of the 10th Anniversary of SBML
Retrospective about SBML on the occasion of the 10th Anniversary of SBMLRetrospective about SBML on the occasion of the 10th Anniversary of SBML
Retrospective about SBML on the occasion of the 10th Anniversary of SBML
 
Metrics driven engineering (velocity 2011)
Metrics driven engineering (velocity 2011)Metrics driven engineering (velocity 2011)
Metrics driven engineering (velocity 2011)
 
Front end performance improvements
Front end performance improvementsFront end performance improvements
Front end performance improvements
 
Winning in a fast changing world
Winning in a fast changing worldWinning in a fast changing world
Winning in a fast changing world
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Usability: Test Types & Ethics
Usability: Test Types & EthicsUsability: Test Types & Ethics
Usability: Test Types & Ethics
 
Educause - Building a Responsive Website for the Presidential Debate
Educause - Building a Responsive Website for the Presidential DebateEducause - Building a Responsive Website for the Presidential Debate
Educause - Building a Responsive Website for the Presidential Debate
 

More from Boulder Java User's Group

Spring insight what just happened
Spring insight   what just happenedSpring insight   what just happened
Spring insight what just happened
Boulder Java User's Group
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
Boulder Java User's Group
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
Boulder Java User's Group
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
Boulder Java User's Group
 
Restful design at work v2.0
Restful design at work v2.0Restful design at work v2.0
Restful design at work v2.0
Boulder Java User's Group
 
Introduction To JavaFX 2.0
Introduction To JavaFX 2.0Introduction To JavaFX 2.0
Introduction To JavaFX 2.0
Boulder Java User's Group
 
55 New Features in Java 7
55 New Features in Java 755 New Features in Java 7
55 New Features in Java 7
Boulder Java User's Group
 
Intro to Redis
Intro to RedisIntro to Redis

More from Boulder Java User's Group (8)

Spring insight what just happened
Spring insight   what just happenedSpring insight   what just happened
Spring insight what just happened
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Restful design at work v2.0
Restful design at work v2.0Restful design at work v2.0
Restful design at work v2.0
 
Introduction To JavaFX 2.0
Introduction To JavaFX 2.0Introduction To JavaFX 2.0
Introduction To JavaFX 2.0
 
55 New Features in Java 7
55 New Features in Java 755 New Features in Java 7
55 New Features in Java 7
 
Intro to Redis
Intro to RedisIntro to Redis
Intro to Redis
 

Recently uploaded

How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
Alison B. Lowndes
 
kk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdfkk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdf
KIRAN KV
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
Tech Guru
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
isBullShit
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
AimanAthambawa1
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
FIDO Alliance
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 

Recently uploaded (20)

How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
NVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space ExplorationNVIDIA at Breakthrough Discuss for Space Exploration
NVIDIA at Breakthrough Discuss for Space Exploration
 
kk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdfkk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdf
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 

Watson and Open Source Tools

  • 1. Watson & Open Source Software Ivan Portilla IT Architect 5/8/12 portilla@gmail.com Sunday, May 20, 12 1
  • 2. If I have seen further it is by standing on the shoulders of giants. Isaac Newton, Letter to Robert Hooke, February 5, 1675 Sunday, May 20, 12 2
  • 3. Objectives By the end of this session, you should be able to: ü Describe the main characteristics of Watson QA system. ü Identify the key open source SW used in Watson. ü Recognize examples of Agile development best practices. 3 Sunday, May 20, 12 3
  • 5. Disclaimer 1 ü This presentation represents the view of the author and does not represent the view of IBM. ü All opinions expressed in this presentation are strictly of the speaker, and do NOT represent those of IBM, IBM management, or anyone else. ü IBM and IBM (logo) are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries. Sunday, May 20, 12 5
  • 6. Disclaimer  2 I  (We)  do  not  work  for  the  Watson  team. Sunday, May 20, 12 6
  • 8. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. Sunday, May 20, 12 7
  • 9. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. Sunday, May 20, 12 7
  • 10. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Sunday, May 20, 12 7
  • 11. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north. Sunday, May 20, 12 7
  • 12. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north. Sunday, May 20, 12 7
  • 13. Let’s Play Jeopardy BEFORE & AFTER: The Jerry Maguire star who automatically maintains your vehicle’s speed. COMMON BONDS: trout, loose change in your pocket, and compliments. Diplomatic Relations: Of the four countries in the world that the United States does not have diplomatic relations with, the one that’s farthest north. Geography: Chile shares its longest land border with this country Sunday, May 20, 12 7
  • 14. Natural  Language  Processing Understanding  natural  language  is  hard! Sunday, May 20, 12 8
  • 15. Watson  educa@on hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 Sunday, May 20, 12 9
  • 16. A Brief History of Watson Sunday, May 20, 12 10
  • 17. A Brief History of Watson § Deep Blue Ended in 1997 § Looking for a new research challenge § 2004, IBM Research manager Charles Lickel, § Ken Jennings § Started in 2005 • David Ferrucci • DeepQA in 2007 • Won Jeopardy Match, Feb 2011 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 Sunday, May 20, 12 11
  • 18. A Brief History of Watson § Deep Blue Ended in 1997 § Looking for a new research challenge § 2004, IBM Research manager Charles Lickel, § Ken Jennings § Started in 2005 • David Ferrucci • DeepQA in 2007 • Won Jeopardy Match, Feb 2011 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 Sunday, May 20, 12 11
  • 19. 11/2010 4/2010 10/2009 5/2009 12/2008 Precision 8/2008 5/2008 12/2007 Baseline hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 Sunday, May 20, 12 12
  • 20. What  is  Watson? ü Understands  natural   language. ü Generates  &  evaluates   hypothesis  for  beAer   outcomes. ü Adapts  &  learns  from   user  selec@ons  and   responses.  hAp://www.ibm.com/innova@on/us/watson/ Sunday, May 20, 12 13
  • 21. Watson  metrics Development Team: 25 people Project Duration: 4 years Hardware: 90 IBM Power-750 servers 2880 Power7 cores @ 80+ TFLOPS 20 TB Disk, 16 TB RAM (memory) 10 Gbps network hAp://na11.apachecon.com/talks/19932 hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 Sunday, May 20, 12 14
  • 23. OSS  -­‐  Linux hAp://video.linux.com/videos/linuxcon-­‐vancouver-­‐day-­‐2-­‐1/ Sunday, May 20, 12 16
  • 24. Open  Source  too hAp://ocw.mit.edu/index.htm hAp://@.arc.nasa.gov/opensource/ Sunday, May 20, 12 17
  • 25. How does it work? Sunday, May 20, 12 18
  • 26. Learning To Rank (Basic Architecture) Hypothesis Keywords Evidence Scoring Watson  by  R.Yates Sunday, May 20, 12 19
  • 27. Watson  Architecture hAp://en.wikipedia.org/wiki/Watson_(computer) Sunday, May 20, 12 20
  • 28. Who is the 44th President of the United States? Sunday, May 20, 12 21
  • 29. Who is the 44th President of the United States? Who is the 44th President of the United States? Watson  by  R.Yates Sunday, May 20, 12 22
  • 30. Who is the 44th President of the United States? 'Who' is the '44th' 'President' of the 'United States'? Who is the 44th President of the United States? Lexical Focus Keywords Answer Type Can be replaced by the correct answer to make a → Person true statement Sunday, May 20, 12 23
  • 31. Who is the 44th President of the United States? Keywords: 44th President United States Question Question Hypothesis Analysis Generation Watson  by  R.Yates Sunday, May 20, 12 24
  • 32. Who is the 44th President of the United States? Keywords: 44th President United States Question Question Hypothesis Analysis Generation Sunday, May 20, 12 25
  • 33. Who is the 44th President of the United States? Primary Search Question Question Hypothesis Analysis Generation Sunday, May 20, 12 26
  • 34. Who is the 44th President of the United States? Barack Obama George W. Bush Harvard Law School Illinois Primary Search Question Question Hypothesis Analysis Generation Watson  by  R.Yates Sunday, May 20, 12 27
  • 35. Who is the 44th President of the United States? Who is the 44th President of the United States? Barack Obama Who is the 44th President of the United States? George W. Bush Answer Who is the 44th President of the United States? Scoring Harvard Law School Who is the 44th President of the United States? Illinois Question Scoring Question Hypothesis Scoring Analysis Generation Scoring Sunday, May 20, 12 28
  • 36. Who is the 44th President of the United States? Who is the 44th President of the United States? Barack Obama Who is the 44th President of the United States? Who is the 44th President of the United States? George W. Bush Answer Answer Scoring Contextual → Person Scoring Answer Answer Who is the 44th President of the United States? Answer Scoring Scoring Is Barack Obama a Person? .90 Harvard Law School Scoring Is George W. Bush a Person? .90 Is Harvard Law School a Person? .10 Who is the 44th President of the United States? Is Illinois a Person? .15 Illinois Question Scoring Question Hypothesis Scoring Analysis Generation Scoring Watson  by  R.Yates Sunday, May 20, 12 29
  • 37. Who is the 44th President of the United States? Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Contextual Answer Scoring Question Scoring Question Hypothesis Scoring Analysis Generation Scoring Unstructured  Informa@on  Management  Applica@ons  -­‐  UIMA Sunday, May 20, 12 30
  • 38. Who is the 44th President of the United States? Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/; born August 4, 1961) is the 44th and current President of the United States. George Walker Bush (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009 and the 46th Governor of Texas from 1995 to 2000. Barack Obama is the 44th President of the United States George W. Bush is the 44th President of the United States Harvard Law School is the 44th President of the United States Illinois is the 44th President of the United States Question Barack Obama .95 George W. Bush .80 Scoring Harvard Law School .05 Question Hypothesis Scoring Illinois.10 Analysis Generation Scoring Unstructured  Informa@on  Management  Applica@ons  -­‐  UIMA Watson  by  R.Yates Sunday, May 20, 12 31
  • 39. Who is the 44th President of the United States? Candidate Answer Answer Evidence retrieval & Confidence Scoring scoring Barack Obama 0.90 0.90 .95 George W. Bush 0.90 0.80 .65 Evidence Harvard Law School 0.10 0.05 Retrieval .05 Illinois 0.15 0.10 .10 Trained Question Models Scoring Question Hypothesis Scoring Analysis Generation Scoring Unstructured  Informa@on  Management  Applica@ons  -­‐  UIMA Watson  by  R.Yates Sunday, May 20, 12 32
  • 40. DeepQA Massively Parallel Probabilistic Evidence-Based Architecture Learned Models help combine and weigh the Evidence Evidence Balance Answer Sources & Combine Models Models Sources Question Answer Evidence Models Models Scoring Retrieval Primary Candidate & Scoring Models Models Search Answer Generation Ques@on  &   Final  Confidence   Ques@on Hypothesis Hypothesis  and  Evidence     Topic   Synthesis Merging  &   Decomposi@on Genera@on Scoring Analysis Ranking Hypothesis Hypothesis and Merging & Answer & Evidence Scoring Ranking Genera@on Confidence ... ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker Sunday, May 20, 12 33
  • 41. OSS in Watson 1.0 Sunday, May 20, 12 34
  • 42. OSS  in  Watson üUIMA,  UIMA-­‐AS üHadoop,  Map  Reduce üLucene,  Indri ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker Sunday, May 20, 12 35
  • 43. UIMA hAp://uima.apache.org/ Sunday, May 20, 12 36
  • 44. UIMA-­‐Asynchronous  Scaleout UIMA  AS  provides  more  flexible  and  powerful  scale  out   capability. Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch Sunday, May 20, 12 37
  • 45. Think  Hadoop A  framework  for  storing   &  processing  big  data. üUp  to  4,000  machines üUp  to  20  PB High  reliability  done  in  soiware: üAutomated  failover  for  data  &   computa@on üImplemented  in  Java hAp://hadoop.apache.org/mapreduce/ Sunday, May 20, 12 38
  • 46. Map  reduce hAp://hadoop.apache.org/mapreduce/ Sunday, May 20, 12 39
  • 47. UIMA  pipelines  in  Hadoop Mul@ple  threads,  each  runing  a   UIMA  pipeline Thread Thread Input Mapper .  .  . Output Thread Thread Thread Reducer Mapper Thread .  .  . ~5000  “splits” Shuffle/Sort . . Thread Thread . . ~50  -­‐  100   . . “mappers” ~400-­‐800    threads . . Thread . Thread Mapper .  .  . Thread Reducer Hadoop Thread Hadoop Distributed Thread Distributed File Mapper Thread File .  .  . System Thread System Thread hAp://blogs.apache.org/founda@on/entry/apache_innova@on_bolsters_ibm_s 41 Sunday, May 20, 12 40
  • 48. Architecture hAp://lucene.apache.org Sunday, May 20, 12 41
  • 49. Indri  &Lemur Indri  is  a  text  search  engine  developed  at  Umass  &  CMU.  Indri  is  part  of  the   Lemur  project. hAp://lemurproject.org/indri/ Sunday, May 20, 12 42
  • 50. Watson  answers  in  2-­‐6  seconds Ques@on 1000’s  of   100s  Possible   100,000’s  scores  from  many  simultaneous  Text   Pieces  of  Evidence Analysis  Algorithms 100s    sources Answers Mul@ple   Interpreta@ons Ques@on  &   Ques@on Hypothesis Hypothesis  and  Evidence     Final  Confidence   Synthesis Topic  Analysis Decomposi@on Genera@on Scoring Merging  &  Ranking Hypothesis Hypothesis  and  Evidence   Genera@on Scoring Answer  &   Confidence .  .  . ApacheCon  2011,  Watson,  a  Reasoning  System:  based  on  Apache  Inside!,  David  Boloker © 2011 IBM Corporation Sunday, May 20, 12 43
  • 51. Other  OSS  in  Watson hAps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/ ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en Sunday, May 20, 12 44
  • 52. J-­‐Archive  data hAp://www.j-­‐archive.com/showgame.php?game_id=3577 Sunday, May 20, 12 45
  • 53. Development Process üWar room setting with continuous collaboration. üWeekly integration. üResults driven with E2E regression testing. ü About 8,000 experiments ü 10 GBs of test data/wk. ü Agile development Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch Sunday, May 20, 12 46
  • 54. Other  OSS hAp://manning.com/ hAp://www.apache.org Sunday, May 20, 12 47
  • 55. Take  Away OSS  is  powerful  and  scalable  enough   for  the  Watson  team,  what  about   your  project? Sunday, May 20, 12 48
  • 56. Resources IBM  Journal  of  Research  and  Development hAp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717 IBM  Watson hAp://www.ibm.com/innova@on/us/watson/ hAp://www.research.ibm.com/deepqa/index.shtml Nova hAp://www.pbs.org/wgbh/nova/tech/smartest-­‐machine-­‐on-­‐earth.html Sunday, May 20, 12 49
  • 57. Review of Objectives Now that you have completed this session, you are able to: ü Describe the main characteristics of Watson QA system. ü Identify the key open source tools used in Watson. ü Recognize examples of Agile development best practices. 51 Sunday, May 20, 12 50
  • 58. …any final questions ? 52 Sunday, May 20, 12 51