SlideShare a Scribd company logo
IBM WATSON
CONCEPT INSIGHTS
Building a Cognitive App
Kory Becker 2016
WHAT IS WATSON?
 2008
 Able to compete with Jeopardy contestants.
 2010
 Capable of defeating human Jeopardy contestants on a regular basis
 2011
 First-place Jeopardy winner, defeating champion Ken Jennings
 Present
 2nd-year medical student equivalency
 Preparing to take the U.S. Medical Board Exam
 Watson API available to developers
WHAT IS WATSON, REALLY?
 Natural language processing
 Machine learning
 Used for analyzing large amounts of unstructured data
 Accessible via a collection of web APIs
WATSON SERVICES
 Concept Expansion
 Concept Insights
 Dialog
 Natural Language Classifier
 Personality Insights
 Relationship Extraction
https://goo.gl/mNmiS3
NATURAL LANGUAGE PROCESSING
 Convert text into a numerical representation
 Find commonalities within data
 Clustering
 Make predictions from data
 Classification
 Category, Popularity, Sentiment, Relationships
BAG OF WORDS MODEL
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
CREATE A DICTIONARY
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
DIGITIZE TEXT
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Corpus
Vector Length = 8
CLASSIFY DOCUMENTS (EATING)
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Corpus
0
1
PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
?
PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
?
PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
1
DOES IT REALLY WORK?
> data
[1] "Cats like to chase mice." "Dogs like to eat big bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document Term Matrix
100% Accuracy Training
Test Case
Success!
Source code:
https://goo.gl/UxjPBs
DEMO 1
NATURAL LANGUAGE PROCESSING
 Text analysis for:
 Entity Extraction
 Sentiment Analysis
 Keywords and Concepts
 Taxonomy
 More
http://www.alchemyapi.com/products/demo/alchemylanguage
DEMO 2
CONCEPT INSIGHTS
 Discovering concept insights within AP content, which might not be
found using traditional keyword search
http://concept.herokuapp.com

More Related Content

Viewers also liked

Knowtech2013 peter schuett_ibm_resonanzgesellschaft
Knowtech2013 peter schuett_ibm_resonanzgesellschaftKnowtech2013 peter schuett_ibm_resonanzgesellschaft
Knowtech2013 peter schuett_ibm_resonanzgesellschaft
Peter Schuett
 
Your cognitive future: How next-gen computing changes the way we live and work
Your cognitive future: How next-gen computing changes the way we live and workYour cognitive future: How next-gen computing changes the way we live and work
Your cognitive future: How next-gen computing changes the way we live and work
IBM in Healthcare
 
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High PerformanceAccenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High PerformanceSalesforce Deutschland
 
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
Virginia Fernandez
 
Watson Marketing 2017 Research
Watson Marketing 2017 ResearchWatson Marketing 2017 Research
Watson Marketing 2017 Research
Jeremy Waite
 
Ibm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentationIbm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentation
diannepatricia
 
Turning agencies into cognitive leaders
Turning agencies into cognitive leadersTurning agencies into cognitive leaders
Turning agencies into cognitive leaders
Estrapadus, LLC
 
Cognitive Computing.PDF
Cognitive Computing.PDFCognitive Computing.PDF
Cognitive Computing.PDF
Charles Quincy
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. Saxena
Manoj Saxena
 
The New Era of Cognitive Computing
The New Era of Cognitive ComputingThe New Era of Cognitive Computing
The New Era of Cognitive Computing
IBM Research
 
IBM Watson Overview
IBM Watson OverviewIBM Watson Overview
IBM Watson Overview
Penn State EdTech Network
 
IBM Watson Analytics Presentation
IBM Watson Analytics PresentationIBM Watson Analytics Presentation
IBM Watson Analytics Presentation
Ian Balina
 
IBM Internet of Things Offerings
IBM Internet of Things OfferingsIBM Internet of Things Offerings
IBM Internet of Things Offerings
IBM Internet of Things
 
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012IBM Switzerland
 
GPU クラウド コンピューティング
GPU クラウド コンピューティングGPU クラウド コンピューティング
GPU クラウド コンピューティング
NVIDIA Japan
 

Viewers also liked (15)

Knowtech2013 peter schuett_ibm_resonanzgesellschaft
Knowtech2013 peter schuett_ibm_resonanzgesellschaftKnowtech2013 peter schuett_ibm_resonanzgesellschaft
Knowtech2013 peter schuett_ibm_resonanzgesellschaft
 
Your cognitive future: How next-gen computing changes the way we live and work
Your cognitive future: How next-gen computing changes the way we live and workYour cognitive future: How next-gen computing changes the way we live and work
Your cognitive future: How next-gen computing changes the way we live and work
 
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High PerformanceAccenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
 
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
 
Watson Marketing 2017 Research
Watson Marketing 2017 ResearchWatson Marketing 2017 Research
Watson Marketing 2017 Research
 
Ibm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentationIbm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentation
 
Turning agencies into cognitive leaders
Turning agencies into cognitive leadersTurning agencies into cognitive leaders
Turning agencies into cognitive leaders
 
Cognitive Computing.PDF
Cognitive Computing.PDFCognitive Computing.PDF
Cognitive Computing.PDF
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. Saxena
 
The New Era of Cognitive Computing
The New Era of Cognitive ComputingThe New Era of Cognitive Computing
The New Era of Cognitive Computing
 
IBM Watson Overview
IBM Watson OverviewIBM Watson Overview
IBM Watson Overview
 
IBM Watson Analytics Presentation
IBM Watson Analytics PresentationIBM Watson Analytics Presentation
IBM Watson Analytics Presentation
 
IBM Internet of Things Offerings
IBM Internet of Things OfferingsIBM Internet of Things Offerings
IBM Internet of Things Offerings
 
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
 
GPU クラウド コンピューティング
GPU クラウド コンピューティングGPU クラウド コンピューティング
GPU クラウド コンピューティング
 

More from Kory Becker

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
Kory Becker
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Kory Becker
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
Kory Becker
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
Kory Becker
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Kory Becker
 
2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation
Kory Becker
 
Discovering Trending Topics in News - 2017 Edition
Discovering Trending Topics in News - 2017 EditionDiscovering Trending Topics in News - 2017 Edition
Discovering Trending Topics in News - 2017 Edition
Kory Becker
 
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Kory Becker
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
Kory Becker
 
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Kory Becker
 
Machine Learning in a Flash: An Introduction to Natural Language Processing
Machine Learning in a Flash: An Introduction to Natural Language ProcessingMachine Learning in a Flash: An Introduction to Natural Language Processing
Machine Learning in a Flash: An Introduction to Natural Language Processing
Kory Becker
 

More from Kory Becker (11)

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
 
2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation
 
Discovering Trending Topics in News - 2017 Edition
Discovering Trending Topics in News - 2017 EditionDiscovering Trending Topics in News - 2017 Edition
Discovering Trending Topics in News - 2017 Edition
 
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
 
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
 
Machine Learning in a Flash: An Introduction to Natural Language Processing
Machine Learning in a Flash: An Introduction to Natural Language ProcessingMachine Learning in a Flash: An Introduction to Natural Language Processing
Machine Learning in a Flash: An Introduction to Natural Language Processing
 

Recently uploaded

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 

Recently uploaded (20)

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 

IBM Watson Concept Insights

  • 1. IBM WATSON CONCEPT INSIGHTS Building a Cognitive App Kory Becker 2016
  • 2. WHAT IS WATSON?  2008  Able to compete with Jeopardy contestants.  2010  Capable of defeating human Jeopardy contestants on a regular basis  2011  First-place Jeopardy winner, defeating champion Ken Jennings  Present  2nd-year medical student equivalency  Preparing to take the U.S. Medical Board Exam  Watson API available to developers
  • 3. WHAT IS WATSON, REALLY?  Natural language processing  Machine learning  Used for analyzing large amounts of unstructured data  Accessible via a collection of web APIs
  • 4. WATSON SERVICES  Concept Expansion  Concept Insights  Dialog  Natural Language Classifier  Personality Insights  Relationship Extraction https://goo.gl/mNmiS3
  • 5. NATURAL LANGUAGE PROCESSING  Convert text into a numerical representation  Find commonalities within data  Clustering  Make predictions from data  Classification  Category, Popularity, Sentiment, Relationships
  • 6. BAG OF WORDS MODEL Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 7. CREATE A DICTIONARY Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 8. DIGITIZE TEXT Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Corpus Vector Length = 8
  • 9. CLASSIFY DOCUMENTS (EATING) Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Corpus 0 1
  • 10. PREDICT ON NEW DATA Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 Corpus 0 1 ?
  • 11. PREDICT ON NEW DATA Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 Corpus 0 1 ?
  • 12. PREDICT ON NEW DATA Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 Corpus 0 1 1
  • 13. DOES IT REALLY WORK? > data [1] "Cats like to chase mice." "Dogs like to eat big bones." > train big bone cat chase dog eat like mice y 1 0 0 1 1 0 0 1 1 0 2 1 1 0 0 1 1 1 0 1 > predict(fit, newdata = train) [1] 0 1 > data2 [1] "Bats eat bugs." > test big bone cat chase dog eat like mice 1 0 0 0 0 0 1 0 0 > predict(fit, newdata = test) [1] 1 Document Term Matrix 100% Accuracy Training Test Case Success! Source code: https://goo.gl/UxjPBs
  • 14. DEMO 1 NATURAL LANGUAGE PROCESSING  Text analysis for:  Entity Extraction  Sentiment Analysis  Keywords and Concepts  Taxonomy  More http://www.alchemyapi.com/products/demo/alchemylanguage
  • 15. DEMO 2 CONCEPT INSIGHTS  Discovering concept insights within AP content, which might not be found using traditional keyword search http://concept.herokuapp.com

Editor's Notes

  1. 1. Introduction My name is Kory Becker. I'm a Software Architect at The Associated Press. As part of the project for “Providing Story Context Through AP Archive” I wanted to investigate using the IBM Watson API to discover new concepts and related stories in AP content. The idea would be to combine these related stories into a timeline list for the original story. This could offer users a historical perspective of related stories around a topic.
  2. 2. What is Watson? So, first, what is Watson? “Watson” is an artificial intelligence technology built by IBM. As you might recall, Watson became famous in Feb 2011 for winning 1st-place in the game-show Jeopardy. Watson won a $1 million prize for winning this game (the winnings were donated to charity). More importantly, Watson defeated the Jeopardy champion, Ken Jennings, who previously had the longest unbeaten run, at 74 games! IBM Researchers’ first take on building a machine that could win Jeopardy? “They initially said no, it's a silly project to work on, it's too gimmicky, it's not a real computer science test, and we probably can't do it anyway”. But, it worked. You can see, in the timeline above, how Watson progressed. Next to Watson, the closest technology most people are familiar with would be Apple’s Siri. However, an important distinction is that Siri is more of a search lookup interface. It uses voice recognition to issue queries against back-end providers. Watson, on the other hand, uses cognitive computing and forms relationships among data, in order to answer a question. Siri may even partner with Watson in the future. Most recently, IBM has released Watson as a series of targeted services, for use by developers in their own projects. This is what we’re going to take a look at today.
  3. 3. What is Watson, Really? Ok, so the game-show winning Watson is pretty fascinating, but what does it actually do? Watson, at its core, is a natural language processing tool. It uses a variety of machine learning techniques to analyze text, understand data, and generate insights from large amounts of unstructured data. The Watson of today is available as a series of web APIs. Each API offers a different machine learning service for processing unstructured data.
  4. 4. Watson Services Some examples of Watson services, include Concept Expansion, Concept Insights (which we’ll take a look at in just a minute), Dialog (a very interesting take on Watson powering a chat bot), Personality Insights, and a lot more. You can see the full list of Watson services at the url above https://goo.gl/mNmiS3 and even try out their online demos.
  5. 5. Natural Language Processing While the internal mechanics of Watson are probably complicated and perhaps even proprietary, the underlying principles of natural language processing itself, can be well understood. The most basic form of natural language processing is to simply convert text into a numerical representation. This gives you an array of numbers. So, each document becomes a same-sized array of numbers. With this, you can apply machine learning algorithms, such as clustering and classification. This allows you to build unique insights into a set of documents, determining characteristics like category, popularity, sentiment, and relationships.
  6. 6. Bag of Words Model To get an idea of the basic principles that Watson might use when processing text, let’s take a look at a quick example. Here are two documents: “Cats like to chase mice.” and “Dogs like to eat big bones”. We’re going to try to categorize these documents as being about “eating”. To do this, we’ll build a bag-of-words model and then apply a classification algorithm. Now, the first thing to note is that the two documents are of different lengths. If you think about it, most documents will practically always be of different lengths. This is fine, because after we digitize the corpus, you’ll see that the resulting data fits neatly within same-sized vectors.
  7. 7. Create a Dictionary So, the first step is to create a dictionary from our corpus. First, we apply a stemming algorithm on the corpus. This will remove the stop-word “to”. Next, we find each unique term and add it to our dictionary. You can see the resulting list on the right-side of this slide. Our dictionary contains 8 terms.
  8. 8. Digitize Text With our dictionary created, we can now digitize the documents. Since our dictionary has 8 terms, each document will be encoded into a vector of length 8. This ensures that all documents end up having the same length. This makes it easier to process with machine learning algorithms. Let’s look at the first document. We’ll take the first term in the dictionary and see if it exists in the first document. The term is “cats”, which does indeed exist in the first document. Therefore, we’ll set a 1 as the first bit. The next term is “like”. Again, it exists in the first document, so we’ll set a 1 as the next bit. This repeats until we see the term “dogs”. This does not exist in the first document, so we set a “0”. Finally, we run through all terms in the dictionary and end up with a vector of length 8 for the first document. We repeat the same steps for the second document, going through each term in the dictionary and checking if it exists in the document.
  9. 9. Classify Documents (Eating) Once the data is digitized, we can classify the documents with regard to “eating”. Since the first document is about chasing mice, maybe playing, we’ll assign a 0. It doesn’t really have to do with eating. The second document is clearly about eating. So, we’ll assign it a 1. At this point, we can train the data with logistic regression, a neural network, a support vector machine, etc.
  10. 10. Predict on New Data Once our model has finished training, we can try predicting on new data to see if it’s classified correctly. Here you can see we have a new document, “Bats eat bugs.”. This document has never been seen by our machine learning algorithm yet. We want to try and categorize it as being about “eating” or not. We’ll first digitize the document, just like we did with our training corpus. In this case, we only have 1 term found in the dictionary.
  11. 11. Predict on New Data The machine learning algorithm is probably going to find a relationship with this particular bit, highlighted in red above. This bit corresponds to the term “eat”, and is found in the training document that was classified as 1 for the category “eating”. Based on this similarity, our model is probably going to predict our new document as … ?
  12. 12. Predict on New Data So this is the general idea behind natural language processing. Now, we didn’t have to classify just on “eating”. We could have just as easily classified based upon sentiment. In fact, this is a common method for performing sentiment analysis with machine learning. (Another non-machine learning method for sentiment analysis is using the AFINN word-list approach). This was a very basic example of natural language processing. In a real-world case, you could have tens of thousands of documents, with perhaps, multiple classifications. There are also various ways to encode the corpus, such as the count of the term within the sentence, tf*idf, and more. This gives us a general background into some of the methods that might be used behind Watson.
  13. 13. Does it Really Work? Here is an actual example in R. The code takes the original sentences from this example and builds a document-term-matrix. Notice how the 1’s and 0’s align perfectly with what we’ve seen in the previous slides. The order of the terms is a little different, but otherwise the values are the same. The ‘y’ column is the classification (eating). We train on the data using a generalized linear model, with 100% accuracy. It’s only 2 training cases, so it’s not all that difficult to train. You can see the results of training when we call “predict”. It outputs the same ‘y’ values as the training data. We then run the model on our test sentence, that the AI has never seen before, and call “predict”. It outputs a 1, which is correct, since this sentence is indeed about “eating”. There is a link to the source code in this slide, for anyone that is curious and wants to try running it.
  14. 14. Demo 1 – Natural Language Processing Let’s take a look at one of the Watson service demos. We’ll start with the natural language process service, from the AlchemyLanguage API. This services offers some pretty interesting features, especially for AP content. Demo http://www.alchemyapi.com/products/demo/alchemylanguage combined with urls from http://bigstory.ap.org
  15. 15. Demo 2 – Concept Insights Another Watson service is the Concept Insights API. This service allows us to discover key concepts from a body of text (perfect for news articles), which might not be found using traditional keyword search. Demo http://concept.herokuapp.com