SlideShare a Scribd company logo
1 of 15
IBM WATSON
CONCEPT INSIGHTS
Building a Cognitive App
Kory Becker 2016
WHAT IS WATSON?
 2008
 Able to compete with Jeopardy contestants.
 2010
 Capable of defeating human Jeopardy contestants on a regular basis
 2011
 First-place Jeopardy winner, defeating champion Ken Jennings
 Present
 2nd-year medical student equivalency
 Preparing to take the U.S. Medical Board Exam
 Watson API available to developers
WHAT IS WATSON, REALLY?
 Natural language processing
 Machine learning
 Used for analyzing large amounts of unstructured data
 Accessible via a collection of web APIs
WATSON SERVICES
 Concept Expansion
 Concept Insights
 Dialog
 Natural Language Classifier
 Personality Insights
 Relationship Extraction
https://goo.gl/mNmiS3
NATURAL LANGUAGE PROCESSING
 Convert text into a numerical representation
 Find commonalities within data
 Clustering
 Make predictions from data
 Classification
 Category, Popularity, Sentiment, Relationships
BAG OF WORDS MODEL
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
CREATE A DICTIONARY
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
DIGITIZE TEXT
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Corpus
Vector Length = 8
CLASSIFY DOCUMENTS (EATING)
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Corpus
0
1
PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
?
PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
?
PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
1
DOES IT REALLY WORK?
> data
[1] "Cats like to chase mice." "Dogs like to eat big bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document Term Matrix
100% Accuracy Training
Test Case
Success!
Source code:
https://goo.gl/UxjPBs
DEMO 1
NATURAL LANGUAGE PROCESSING
 Text analysis for:
 Entity Extraction
 Sentiment Analysis
 Keywords and Concepts
 Taxonomy
 More
http://www.alchemyapi.com/products/demo/alchemylanguage
DEMO 2
CONCEPT INSIGHTS
 Discovering concept insights within AP content, which might not be
found using traditional keyword search
http://concept.herokuapp.com

More Related Content

Viewers also liked

Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High PerformanceAccenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Salesforce Deutschland
 
Cognitive Computing.PDF
Cognitive Computing.PDFCognitive Computing.PDF
Cognitive Computing.PDF
Charles Quincy
 
The New Era of Cognitive Computing
The New Era of Cognitive ComputingThe New Era of Cognitive Computing
The New Era of Cognitive Computing
IBM Research
 
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM Switzerland
 

Viewers also liked (15)

Knowtech2013 peter schuett_ibm_resonanzgesellschaft
Knowtech2013 peter schuett_ibm_resonanzgesellschaftKnowtech2013 peter schuett_ibm_resonanzgesellschaft
Knowtech2013 peter schuett_ibm_resonanzgesellschaft
 
Your cognitive future: How next-gen computing changes the way we live and work
Your cognitive future: How next-gen computing changes the way we live and workYour cognitive future: How next-gen computing changes the way we live and work
Your cognitive future: How next-gen computing changes the way we live and work
 
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High PerformanceAccenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
Accenture - Cloud-Lösungen für Unternehmen auf dem Weg zu High Performance
 
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
 
Watson Marketing 2017 Research
Watson Marketing 2017 ResearchWatson Marketing 2017 Research
Watson Marketing 2017 Research
 
Ibm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentationIbm cognitive business_strategy_presentation
Ibm cognitive business_strategy_presentation
 
Turning agencies into cognitive leaders
Turning agencies into cognitive leadersTurning agencies into cognitive leaders
Turning agencies into cognitive leaders
 
Cognitive Computing.PDF
Cognitive Computing.PDFCognitive Computing.PDF
Cognitive Computing.PDF
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. Saxena
 
The New Era of Cognitive Computing
The New Era of Cognitive ComputingThe New Era of Cognitive Computing
The New Era of Cognitive Computing
 
IBM Watson Overview
IBM Watson OverviewIBM Watson Overview
IBM Watson Overview
 
IBM Watson Analytics Presentation
IBM Watson Analytics PresentationIBM Watson Analytics Presentation
IBM Watson Analytics Presentation
 
IBM Internet of Things Offerings
IBM Internet of Things OfferingsIBM Internet of Things Offerings
IBM Internet of Things Offerings
 
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
IBM SmartCloud Provisioning Workshop, 25. Oktober 2012
 
GPU クラウド コンピューティング
GPU クラウド コンピューティングGPU クラウド コンピューティング
GPU クラウド コンピューティング
 

More from Kory Becker

More from Kory Becker (11)

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
 
2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation
 
Discovering Trending Topics in News - 2017 Edition
Discovering Trending Topics in News - 2017 EditionDiscovering Trending Topics in News - 2017 Edition
Discovering Trending Topics in News - 2017 Edition
 
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
 
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
 
Machine Learning in a Flash: An Introduction to Natural Language Processing
Machine Learning in a Flash: An Introduction to Natural Language ProcessingMachine Learning in a Flash: An Introduction to Natural Language Processing
Machine Learning in a Flash: An Introduction to Natural Language Processing
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

IBM Watson Concept Insights

  • 1. IBM WATSON CONCEPT INSIGHTS Building a Cognitive App Kory Becker 2016
  • 2. WHAT IS WATSON?  2008  Able to compete with Jeopardy contestants.  2010  Capable of defeating human Jeopardy contestants on a regular basis  2011  First-place Jeopardy winner, defeating champion Ken Jennings  Present  2nd-year medical student equivalency  Preparing to take the U.S. Medical Board Exam  Watson API available to developers
  • 3. WHAT IS WATSON, REALLY?  Natural language processing  Machine learning  Used for analyzing large amounts of unstructured data  Accessible via a collection of web APIs
  • 4. WATSON SERVICES  Concept Expansion  Concept Insights  Dialog  Natural Language Classifier  Personality Insights  Relationship Extraction https://goo.gl/mNmiS3
  • 5. NATURAL LANGUAGE PROCESSING  Convert text into a numerical representation  Find commonalities within data  Clustering  Make predictions from data  Classification  Category, Popularity, Sentiment, Relationships
  • 6. BAG OF WORDS MODEL Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 7. CREATE A DICTIONARY Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 8. DIGITIZE TEXT Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Corpus Vector Length = 8
  • 9. CLASSIFY DOCUMENTS (EATING) Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Corpus 0 1
  • 10. PREDICT ON NEW DATA Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 Corpus 0 1 ?
  • 11. PREDICT ON NEW DATA Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 Corpus 0 1 ?
  • 12. PREDICT ON NEW DATA Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 Corpus 0 1 1
  • 13. DOES IT REALLY WORK? > data [1] "Cats like to chase mice." "Dogs like to eat big bones." > train big bone cat chase dog eat like mice y 1 0 0 1 1 0 0 1 1 0 2 1 1 0 0 1 1 1 0 1 > predict(fit, newdata = train) [1] 0 1 > data2 [1] "Bats eat bugs." > test big bone cat chase dog eat like mice 1 0 0 0 0 0 1 0 0 > predict(fit, newdata = test) [1] 1 Document Term Matrix 100% Accuracy Training Test Case Success! Source code: https://goo.gl/UxjPBs
  • 14. DEMO 1 NATURAL LANGUAGE PROCESSING  Text analysis for:  Entity Extraction  Sentiment Analysis  Keywords and Concepts  Taxonomy  More http://www.alchemyapi.com/products/demo/alchemylanguage
  • 15. DEMO 2 CONCEPT INSIGHTS  Discovering concept insights within AP content, which might not be found using traditional keyword search http://concept.herokuapp.com

Editor's Notes

  1. 1. Introduction My name is Kory Becker. I'm a Software Architect at The Associated Press. As part of the project for “Providing Story Context Through AP Archive” I wanted to investigate using the IBM Watson API to discover new concepts and related stories in AP content. The idea would be to combine these related stories into a timeline list for the original story. This could offer users a historical perspective of related stories around a topic.
  2. 2. What is Watson? So, first, what is Watson? “Watson” is an artificial intelligence technology built by IBM. As you might recall, Watson became famous in Feb 2011 for winning 1st-place in the game-show Jeopardy. Watson won a $1 million prize for winning this game (the winnings were donated to charity). More importantly, Watson defeated the Jeopardy champion, Ken Jennings, who previously had the longest unbeaten run, at 74 games! IBM Researchers’ first take on building a machine that could win Jeopardy? “They initially said no, it's a silly project to work on, it's too gimmicky, it's not a real computer science test, and we probably can't do it anyway”. But, it worked. You can see, in the timeline above, how Watson progressed. Next to Watson, the closest technology most people are familiar with would be Apple’s Siri. However, an important distinction is that Siri is more of a search lookup interface. It uses voice recognition to issue queries against back-end providers. Watson, on the other hand, uses cognitive computing and forms relationships among data, in order to answer a question. Siri may even partner with Watson in the future. Most recently, IBM has released Watson as a series of targeted services, for use by developers in their own projects. This is what we’re going to take a look at today.
  3. 3. What is Watson, Really? Ok, so the game-show winning Watson is pretty fascinating, but what does it actually do? Watson, at its core, is a natural language processing tool. It uses a variety of machine learning techniques to analyze text, understand data, and generate insights from large amounts of unstructured data. The Watson of today is available as a series of web APIs. Each API offers a different machine learning service for processing unstructured data.
  4. 4. Watson Services Some examples of Watson services, include Concept Expansion, Concept Insights (which we’ll take a look at in just a minute), Dialog (a very interesting take on Watson powering a chat bot), Personality Insights, and a lot more. You can see the full list of Watson services at the url above https://goo.gl/mNmiS3 and even try out their online demos.
  5. 5. Natural Language Processing While the internal mechanics of Watson are probably complicated and perhaps even proprietary, the underlying principles of natural language processing itself, can be well understood. The most basic form of natural language processing is to simply convert text into a numerical representation. This gives you an array of numbers. So, each document becomes a same-sized array of numbers. With this, you can apply machine learning algorithms, such as clustering and classification. This allows you to build unique insights into a set of documents, determining characteristics like category, popularity, sentiment, and relationships.
  6. 6. Bag of Words Model To get an idea of the basic principles that Watson might use when processing text, let’s take a look at a quick example. Here are two documents: “Cats like to chase mice.” and “Dogs like to eat big bones”. We’re going to try to categorize these documents as being about “eating”. To do this, we’ll build a bag-of-words model and then apply a classification algorithm. Now, the first thing to note is that the two documents are of different lengths. If you think about it, most documents will practically always be of different lengths. This is fine, because after we digitize the corpus, you’ll see that the resulting data fits neatly within same-sized vectors.
  7. 7. Create a Dictionary So, the first step is to create a dictionary from our corpus. First, we apply a stemming algorithm on the corpus. This will remove the stop-word “to”. Next, we find each unique term and add it to our dictionary. You can see the resulting list on the right-side of this slide. Our dictionary contains 8 terms.
  8. 8. Digitize Text With our dictionary created, we can now digitize the documents. Since our dictionary has 8 terms, each document will be encoded into a vector of length 8. This ensures that all documents end up having the same length. This makes it easier to process with machine learning algorithms. Let’s look at the first document. We’ll take the first term in the dictionary and see if it exists in the first document. The term is “cats”, which does indeed exist in the first document. Therefore, we’ll set a 1 as the first bit. The next term is “like”. Again, it exists in the first document, so we’ll set a 1 as the next bit. This repeats until we see the term “dogs”. This does not exist in the first document, so we set a “0”. Finally, we run through all terms in the dictionary and end up with a vector of length 8 for the first document. We repeat the same steps for the second document, going through each term in the dictionary and checking if it exists in the document.
  9. 9. Classify Documents (Eating) Once the data is digitized, we can classify the documents with regard to “eating”. Since the first document is about chasing mice, maybe playing, we’ll assign a 0. It doesn’t really have to do with eating. The second document is clearly about eating. So, we’ll assign it a 1. At this point, we can train the data with logistic regression, a neural network, a support vector machine, etc.
  10. 10. Predict on New Data Once our model has finished training, we can try predicting on new data to see if it’s classified correctly. Here you can see we have a new document, “Bats eat bugs.”. This document has never been seen by our machine learning algorithm yet. We want to try and categorize it as being about “eating” or not. We’ll first digitize the document, just like we did with our training corpus. In this case, we only have 1 term found in the dictionary.
  11. 11. Predict on New Data The machine learning algorithm is probably going to find a relationship with this particular bit, highlighted in red above. This bit corresponds to the term “eat”, and is found in the training document that was classified as 1 for the category “eating”. Based on this similarity, our model is probably going to predict our new document as … ?
  12. 12. Predict on New Data So this is the general idea behind natural language processing. Now, we didn’t have to classify just on “eating”. We could have just as easily classified based upon sentiment. In fact, this is a common method for performing sentiment analysis with machine learning. (Another non-machine learning method for sentiment analysis is using the AFINN word-list approach). This was a very basic example of natural language processing. In a real-world case, you could have tens of thousands of documents, with perhaps, multiple classifications. There are also various ways to encode the corpus, such as the count of the term within the sentence, tf*idf, and more. This gives us a general background into some of the methods that might be used behind Watson.
  13. 13. Does it Really Work? Here is an actual example in R. The code takes the original sentences from this example and builds a document-term-matrix. Notice how the 1’s and 0’s align perfectly with what we’ve seen in the previous slides. The order of the terms is a little different, but otherwise the values are the same. The ‘y’ column is the classification (eating). We train on the data using a generalized linear model, with 100% accuracy. It’s only 2 training cases, so it’s not all that difficult to train. You can see the results of training when we call “predict”. It outputs the same ‘y’ values as the training data. We then run the model on our test sentence, that the AI has never seen before, and call “predict”. It outputs a 1, which is correct, since this sentence is indeed about “eating”. There is a link to the source code in this slide, for anyone that is curious and wants to try running it.
  14. 14. Demo 1 – Natural Language Processing Let’s take a look at one of the Watson service demos. We’ll start with the natural language process service, from the AlchemyLanguage API. This services offers some pretty interesting features, especially for AP content. Demo http://www.alchemyapi.com/products/demo/alchemylanguage combined with urls from http://bigstory.ap.org
  15. 15. Demo 2 – Concept Insights Another Watson service is the Concept Insights API. This service allows us to discover key concepts from a body of text (perfect for news articles), which might not be found using traditional keyword search. Demo http://concept.herokuapp.com