2. WHAT IS WATSON?
2008
Able to compete with Jeopardy contestants.
2010
Capable of defeating human Jeopardy contestants on a regular basis
2011
First-place Jeopardy winner, defeating champion Ken Jennings
Present
2nd-year medical student equivalency
Preparing to take the U.S. Medical Board Exam
Watson API available to developers
3. WHAT IS WATSON, REALLY?
Natural language processing
Machine learning
Used for analyzing large amounts of unstructured data
Accessible via a collection of web APIs
5. NATURAL LANGUAGE PROCESSING
Convert text into a numerical representation
Find commonalities within data
Clustering
Make predictions from data
Classification
Category, Popularity, Sentiment, Relationships
6. BAG OF WORDS MODEL
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
7. CREATE A DICTIONARY
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
8. DIGITIZE TEXT
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Corpus
Vector Length = 8
9. CLASSIFY DOCUMENTS (EATING)
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Corpus
0
1
10. PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
?
11. PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
?
12. PREDICT ON NEW DATA
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
Corpus
0
1
1
13. DOES IT REALLY WORK?
> data
[1] "Cats like to chase mice." "Dogs like to eat big bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document Term Matrix
100% Accuracy Training
Test Case
Success!
Source code:
https://goo.gl/UxjPBs
14. DEMO 1
NATURAL LANGUAGE PROCESSING
Text analysis for:
Entity Extraction
Sentiment Analysis
Keywords and Concepts
Taxonomy
More
http://www.alchemyapi.com/products/demo/alchemylanguage
15. DEMO 2
CONCEPT INSIGHTS
Discovering concept insights within AP content, which might not be
found using traditional keyword search
http://concept.herokuapp.com
Editor's Notes
1. Introduction
My name is Kory Becker. I'm a Software Architect at The Associated Press.
As part of the project for “Providing Story Context Through AP Archive” I wanted to investigate using the IBM Watson API to discover new concepts and related stories in AP content.
The idea would be to combine these related stories into a timeline list for the original story.
This could offer users a historical perspective of related stories around a topic.
2. What is Watson?
So, first, what is Watson?
“Watson” is an artificial intelligence technology built by IBM.
As you might recall, Watson became famous in Feb 2011 for winning 1st-place in the game-show Jeopardy.
Watson won a $1 million prize for winning this game (the winnings were donated to charity).
More importantly, Watson defeated the Jeopardy champion, Ken Jennings, who previously had the longest unbeaten run, at 74 games!
IBM Researchers’ first take on building a machine that could win Jeopardy? “They initially said no, it's a silly project to work on, it's too gimmicky, it's not a real computer science test, and we probably can't do it anyway”.
But, it worked.
You can see, in the timeline above, how Watson progressed.
Next to Watson, the closest technology most people are familiar with would be Apple’s Siri. However, an important distinction is that Siri is more of a search lookup interface. It uses voice recognition to issue queries against back-end providers.
Watson, on the other hand, uses cognitive computing and forms relationships among data, in order to answer a question. Siri may even partner with Watson in the future.
Most recently, IBM has released Watson as a series of targeted services, for use by developers in their own projects.
This is what we’re going to take a look at today.
3. What is Watson, Really?
Ok, so the game-show winning Watson is pretty fascinating, but what does it actually do?
Watson, at its core, is a natural language processing tool. It uses a variety of machine learning techniques to analyze text, understand data, and generate insights from large amounts of unstructured data.
The Watson of today is available as a series of web APIs. Each API offers a different machine learning service for processing unstructured data.
4. Watson Services
Some examples of Watson services, include Concept Expansion, Concept Insights (which we’ll take a look at in just a minute), Dialog (a very interesting take on Watson powering a chat bot), Personality Insights, and a lot more.
You can see the full list of Watson services at the url above https://goo.gl/mNmiS3 and even try out their online demos.
5. Natural Language Processing
While the internal mechanics of Watson are probably complicated and perhaps even proprietary, the underlying principles of natural language processing itself, can be well understood.
The most basic form of natural language processing is to simply convert text into a numerical representation. This gives you an array of numbers. So, each document becomes a same-sized array of numbers. With this, you can apply machine learning algorithms, such as clustering and classification.
This allows you to build unique insights into a set of documents, determining characteristics like category, popularity, sentiment, and relationships.
6. Bag of Words Model
To get an idea of the basic principles that Watson might use when processing text, let’s take a look at a quick example.
Here are two documents: “Cats like to chase mice.” and “Dogs like to eat big bones”.
We’re going to try to categorize these documents as being about “eating”.
To do this, we’ll build a bag-of-words model and then apply a classification algorithm.
Now, the first thing to note is that the two documents are of different lengths. If you think about it, most documents will practically always be of different lengths. This is fine, because after we digitize the corpus, you’ll see that the resulting data fits neatly within same-sized vectors.
7. Create a Dictionary
So, the first step is to create a dictionary from our corpus.
First, we apply a stemming algorithm on the corpus. This will remove the stop-word “to”.
Next, we find each unique term and add it to our dictionary. You can see the resulting list on the right-side of this slide. Our dictionary contains 8 terms.
8. Digitize Text
With our dictionary created, we can now digitize the documents.
Since our dictionary has 8 terms, each document will be encoded into a vector of length 8. This ensures that all documents end up having the same length. This makes it easier to process with machine learning algorithms.
Let’s look at the first document. We’ll take the first term in the dictionary and see if it exists in the first document. The term is “cats”, which does indeed exist in the first document. Therefore, we’ll set a 1 as the first bit.
The next term is “like”. Again, it exists in the first document, so we’ll set a 1 as the next bit. This repeats until we see the term “dogs”. This does not exist in the first document, so we set a “0”.
Finally, we run through all terms in the dictionary and end up with a vector of length 8 for the first document.
We repeat the same steps for the second document, going through each term in the dictionary and checking if it exists in the document.
9. Classify Documents (Eating)
Once the data is digitized, we can classify the documents with regard to “eating”. Since the first document is about chasing mice, maybe playing, we’ll assign a 0. It doesn’t really have to do with eating.
The second document is clearly about eating. So, we’ll assign it a 1.
At this point, we can train the data with logistic regression, a neural network, a support vector machine, etc.
10. Predict on New Data
Once our model has finished training, we can try predicting on new data to see if it’s classified correctly.
Here you can see we have a new document, “Bats eat bugs.”. This document has never been seen by our machine learning algorithm yet. We want to try and categorize it as being about “eating” or not.
We’ll first digitize the document, just like we did with our training corpus. In this case, we only have 1 term found in the dictionary.
11. Predict on New Data
The machine learning algorithm is probably going to find a relationship with this particular bit, highlighted in red above. This bit corresponds to the term “eat”, and is found in the training document that was classified as 1 for the category “eating”.
Based on this similarity, our model is probably going to predict our new document as … ?
12. Predict on New Data
So this is the general idea behind natural language processing.
Now, we didn’t have to classify just on “eating”. We could have just as easily classified based upon sentiment. In fact, this is a common method for performing sentiment analysis with machine learning. (Another non-machine learning method for sentiment analysis is using the AFINN word-list approach).
This was a very basic example of natural language processing. In a real-world case, you could have tens of thousands of documents, with perhaps, multiple classifications. There are also various ways to encode the corpus, such as the count of the term within the sentence, tf*idf, and more.
This gives us a general background into some of the methods that might be used behind Watson.
13. Does it Really Work?
Here is an actual example in R. The code takes the original sentences from this example and builds a document-term-matrix. Notice how the 1’s and 0’s align perfectly with what we’ve seen in the previous slides. The order of the terms is a little different, but otherwise the values are the same.
The ‘y’ column is the classification (eating).
We train on the data using a generalized linear model, with 100% accuracy. It’s only 2 training cases, so it’s not all that difficult to train. You can see the results of training when we call “predict”. It outputs the same ‘y’ values as the training data.
We then run the model on our test sentence, that the AI has never seen before, and call “predict”. It outputs a 1, which is correct, since this sentence is indeed about “eating”.
There is a link to the source code in this slide, for anyone that is curious and wants to try running it.
14. Demo 1 – Natural Language Processing
Let’s take a look at one of the Watson service demos. We’ll start with the natural language process service, from the AlchemyLanguage API.
This services offers some pretty interesting features, especially for AP content.
Demo http://www.alchemyapi.com/products/demo/alchemylanguage combined with urls from http://bigstory.ap.org
15. Demo 2 – Concept Insights
Another Watson service is the Concept Insights API. This service allows us to discover key concepts from a body of text (perfect for news articles), which might not be found using traditional keyword search.
Demo http://concept.herokuapp.com