Linked Data in Production: Moving Beyond Ontologies
IBM Watson & Open Source Software - LinuxCon 2012
1. Watson & Open Source Software
Ivan Portilla
IT Architect
8/29/12
portilla@gmail.com
2. If I have seen further it is by standing on the shoulders of
giants.
Isaac Newton, Letter to Robert Hooke, February 5, 1675
3. Objectives
By the end of this session, you should
be able to:
ü Describe the main characteristics of
Watson QA system.
ü Identify the key open source SW
used in Watson.
ü Recognize examples of Agile
development best practices.
3
5. Disclaimer 1
ü This presentation represents the view of the author and
does not represent the view of IBM.
ü All opinions expressed in this presentation are strictly of
the speaker, and do NOT represent those of IBM, IBM
management, or anyone else.
ü IBM and IBM (logo) are trademarks or registered
trademarks of International Business Machines
Corporation in the United States and/or other countries.
6. Disclaimer
2
I
(We)
do
not
work
for
the
Watson
team.
9. Let’s Play Jeopardy
BEFORE & AFTER: The Jerry Maguire star who
automatically maintains your vehicle’s speed.
COMMON BONDS: trout, loose change in your pocket, and
compliments.
Diplomatic Relations: Of the four countries in the world
that the United States does not have diplomatic relations
with, the one that’s farthest north.
Geography: Chile shares its longest land border with this
country
13. A Brief History of Watson
§ Deep Blue Ended in 1997
§ Looking for a new research challenge
§ 2004, IBM Research manager Charles Lickel,
§ Ken Jennings
§ Started in 2005
• David Ferrucci
• DeepQA in 2007
• Won Jeopardy Match, Feb 2011
15. What
is
Watson?
ü Understands
natural
language.
ü Generates
&
evaluates
hypothesis
for
beQer
outcomes.
ü Adapts
&
learns
from
user
selec@ons
and
responses.
hQp://www.ibm.com/innova@on/us/watson/
16. Watson
metrics
Development Team: 25 people
Project Duration: 4 years
Software: 1,000,000 SLOC
700K Java, 300K C++
~ 130 components
Hardware: 90 IBM Power-750 servers
2880 Power7 cores @ 80+ TFLOPS
20 TB Disk, 16 TB RAM (memory)
10 Gbps network
hQp://na11.apachecon.com/talks/19932
23. Who is the 44th President of the
United States?
24. Who is the 44th President of the United States?
Who is the 44th
President of the Ques@on
&
United States? Topic
Analysis
Watson
by
R.
Yates
25. Who is the 44th President of the United States?
'Who' is the '44th' 'President' of the 'United States'?
Who is the 44th President
of the United States?
Ques@on
Lexical Focus Keywords
&
Topic
Answer
Analysis
Type * Can be replaced
by the correct
answer to make a
→ Person true statement
Watson
by
R.
Yates
26. Who is the 44th President of the United States?
27. Who is the 44th President of the United States?
28. Who is the 44th President of the United States?
Primary
Search
Watson
by
R.
Yates
29. Who is the 44th President of the United States?
Barack Obama
George W. Bush
Harvard Law School
Illinois
Primary
Search
Watson
by
R.
Yates
30. Who is the 44th President of the United States?
Who is the 44th President of the United States?
Barack Obama
Who is the 44th President of the United States?
George W. Bush
Who is the 44th President of the United States?
Harvard Law School
Who is the 44th President of the United States?
Illinois
Watson
by
R.
Yates
31. Who is the 44th President of the United States?
Who is the 44th President of the United States?
Barack Obama
Who is the 44th President of the United States?
George W. Bush
Who is the 44th President of the United States?
Harvard Law School
Who is the 44th President of the United States?
Illinois
Who is the 44th President of the
United States?
→ Person
Is Barack Obama a Person? .90
Is George W. Bush a Person? .90
Is Harvard Law School a Person? .10
Is Illinois a Person? .15
Watson
by
R.
Yates
32. Who is the 44th President of the United States?
Barack Obama is the 44th President of the United States
George W. Bush is the 44th President of the United States
Harvard Law School is the 44th President of the United States
Illinois is the 44th President of the United States
Watson
by
R.
Yates
33. Who is the 44th President of the United States?
Barack Hussein Obama II (i/bəәˈrɑːk huːˈseɪn oʊˈbɑːməә/;
born August 4, 1961) is the 44th and current
President of the United States.
George Walker Bush (born July 6, 1946) is an American
politician who served as the 43rd President of the United
States from 2001 to 2009 and the 46th Governor of Texas
from 1995 to 2000.
Barack Obama is the 44th President of the United States
George W. Bush is the 44th President of the United States
Harvard Law School is the 44th President of the United States
Illinois is the 44th President of the United States
Barack Obama .95
George W. Bush .80
Harvard Law School .05
Illinois.10
Watson
by
R.
Yates
34. Who is the 44th President of the United States?
Candidate Answer Answer Evidence retrieval & Confidence
Scoring scoring
Barack Obama 0.90 0.90 .95
George W. Bush 0.90 0.80 .65
Harvard Law School 0.10 0.05 .05
Illinois 0.15 0.10 Evidence .10
Retrieval
Watson
by
R.
Yates
35. DeepQA
Massively Parallel Probabilistic Evidence-Based Architecture
Learned Models
help combine and
weigh the Evidence
Evidence Balance
Answer Sources & Combine Models Models
Sources
Question Answer Evidence
Models Models
Scoring Retrieval
Candidate
Primary & Scoring Models Models
Answer
Search
Generation
Ques@on
&
Final
Confidence
Ques@on
Hypothesis
Hypothesis
and
Evidence
Topic
Synthesis Merging
&
Decomposi@on
Genera@on
Scoring
Analysis
Ranking
Hypothesis
Hypothesis and Merging & Answer &
Evidence Scoring Ranking
Genera@on
Confidence
...
ApacheCon
2011,
Watson,
a
Reasoning
System:
based
on
Apache
Inside!,
David
Boloker
39. UIMA-‐Asynchronous
Scaleout
UIMA
AS
provides
more
flexible
and
powerful
scale
out
capability.
Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
40. Think
Hadoop
A
framework
for
storing
&
processing
big
data.
ü 4,000
machines
ü 20
PB
High
reliability
done
in
sofware:
ü Automated
failover
for
data
&
computa@on
ü Implemented
in
Java
hQp://hadoop.apache.org/mapreduce/
44. Indri
&Lemur
Indri
is
a
text
search
engine
developed
at
Umass
&
CMU.
Indri
is
part
of
the
Lemur
project.
indrid
NetworkServerStub
runquery
LocalServer
NetworkServerProxy
NetworkServerProxy
QueryEnvironment indrid
#combine(#2(george bush).title)
NetworkServerStub
LocalServer
LocalServer
hQp://lemurproject.org/indri/
46. Other
OSS
in
Watson
hQps://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/
ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en
47. J-‐Archive
data
hQp://www.j-‐archive.com/showgame.php?game_id=3577
48. Development Process
ü War room setting with
continuous collaboration.
ü Weekly integration.
ü Results driven with E2E
regression testing.
ü About 8,000 experiments
ü 10 GBs of test data/wk.
ü Agile development
Innovate 2011, How Does It Work? The Architecture of Watson. Grady Booch
49. Related
Materials
hQp://www.apache.org
hQp://manning.com/
www.caltech.edu
hQp://oreilly.com/
50. Take
Away
OSS
is
powerful
and
scalable
enough
for
the
Watson
team,
what
about
your
project?
51. Resources
IBM
Journal
of
Research
and
Development
hQp://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717
IBM
Watson
hQp://www.ibm.com/innova@on/us/watson/
hQp://www.research.ibm.com/deepqa/index.shtml
Nova
hQp://www.pbs.org/wgbh/nova/tech/smartest-‐machine-‐on-‐earth.html
52. Review of Objectives
Now that you have completed this session, you are able to:
ü Describe the main characteristics of Watson QA system.
ü Identify the key open source tools used in Watson.
ü Recognize examples of Agile development best practices.
52