Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cohan Sujay Carlos
CEO, Aiaioo Labs
Fun with Text
Managing Text Analytics
What I am going to talk about.
Text Analytics
1. Examine 3 kinds of opportunities
2. Discuss 3 text analytics problems
3. ...
What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / ...
What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / ...
What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / ...
What if we can master “text”?
What do we get from it?
Do you observe a pattern?
In every vertical …
Output Text / Store an...
How do we unlock
the value in “text”?
Output Text / Store and Transform Text / Ingest and Analyze Text
Natural Language Ge...
Use Case 1:
Customer Service
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of...
Use Case 1:
Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234
(lo...
Use Case 1:
Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234
(lo...
Use Case 1:
M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials,...
Use Case 1:
M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials,...
Use Case 1: Customer Service
[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet wit...
Use Case 1: Customer Service
[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet wit...
Use Case 1: Customer Service
[ Information Extraction ]
Relations tell you about the connections between entities.
“John C...
Use Case 1: Customer Service
[ Information Extraction ]
“John Chambers of Springfield, MA
reported a problem with the clut...
Use Case 1: Customer Service
[ Information Extraction ]
“John Chambers of Springfield, MA
reported a problem with the clut...
Use Case 1: Customer Service
[ Information Extraction ]
How can text analytics methods be used
to automate entity and rela...
Use Case 1: Customer Service
[ Information Extraction ]
Rule-based frameworks for entity and relation extraction?
http://s...
Use Case 1: Customer Service
[ Information Extraction ]
Use Case 1: Customer Service
[ Information Extraction ]
It uses lists of first names and last names of persons, and names ...
Use Case 1: Customer Service
[ Information Extraction ]
Machine learning frameworks for entity and relation extraction?
ht...
Use Case 1: Customer Service
[ Information Extraction ]
Machine learning frameworks need training data.
https://opennlp.ap...
Use Case 1: Customer Service
[ Information Extraction ]
From examples such as:
It learns to recognize:
How does OpenNLP id...
Use Case 1: Customer Service
[ Information Extraction ]
How to choose between text analytics methods
for entity and relati...
5’11”
5’ 8”
Can you classify these door heights as: Short / Tall ?
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs ...
5’11”
5’ 8”
In analytics, an analyst comes up
with a rule.
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
If door_height < 6’ ...
5’11”
5’ 8”
In machine learning, the computer comes up with a
rule from examples.
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’1...
How do we unlock
the value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Tex...
How do we unlock
the value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Te...
Use Case 2:
Organizing Text for Storage
Let’s say you have some text … … and you want to mark it as one of …
“John Chamber...
Use Case 2: Organizing Text
[ Text Categorization ]
Start by collecting some samples of documents
of each of your categori...
Use Case 2: Organizing Text
[ Text Categorization ]
Train a classifierwith them.
Aiaioo Labs aiaioo.com
Report Inquiry
I h...
Use Case 2: Organizing Text
[ Text Categorization ]
Start by collecting some samples of documents
of each of your categori...
Use Case 2: Organizing Text
[ Text Categorization ]
Train a classifierwith them.
Politics Sports
The United Nations
The Un...
Use Case 2: Organizing Text
[ Text Categorization ]
Run the classifieron a new piece of text.
The classifierwill return a ...
Use Case 2: Organizing Text
[ Text Categorization ]
How can text analytics methods be used
to automate organization/catego...
Use Case 2: Organizing Text
[ Text Categorization ]
But rule-based methods work for classification too.
Rule-based text ca...
Use Case 2: Organizing Text
[ Text Categorization ]
We use lists of negative and positive words (usually adjectives)
(avai...
Use Case 2: Organizing Text
[ Text Categorization ]
Can we use entity and relation extraction to do better?
“I am sad that...
Use Case 2: Organizing Text
[ Text Categorization ]
How to choose between text analytics methods
for text categorization?
...
How do we unlock
the value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Tex...
How do we unlock
the value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Te...
How do we unlock
the value in “text”?
The third use case …
Output Text / Store and Transform Text / Ingest and Analyze Tex...
Use Case 3:
Answering Questions
Let’s say you get a question … … and you want to answer to be one of …
“Do you ship your c...
Use Case 3:
Answering Questions
First you classify the question into one of 3 types… and these are…
“Do you ship your cars...
Use Case 3:
Answering Questions
Look for answers in databases that you created using entity / relationship extraction
“Do ...
To watch out for:
Text Analytics Traps
1. Testing on Training Data
2. Using US Training Data for India
3. Treating all Dat...
To embrace:
Text Analytics Tricks
1. UI Compensation for AI Inaccuracy
2. Raising Precision at the Cost of Recall
3. Domai...
About Aiaioo Labs
AI Research Lab
1. http://aiaioo.com
2. http://aiaioo.com/publications
3. http://aiaioo.wordpress.com
Ai...
THANK YOU
Aiaioo Labs aiaioo.com
Upcoming SlideShare
Loading in …5
×

Fun with Text - Managing Text Analytics

1,530 views

Published on

A lecture on text analytics - 3 types of opportunities, 3 use cases, 3 dos and 3 don'ts.

Get the hang of how to go about solving a text-related business problem using text analytics.

Published in: Engineering
  • Be the first to comment

Fun with Text - Managing Text Analytics

  1. 1. Cohan Sujay Carlos CEO, Aiaioo Labs Fun with Text Managing Text Analytics
  2. 2. What I am going to talk about. Text Analytics 1. Examine 3 kinds of opportunities 2. Discuss 3 text analytics problems 3. Touch upon 3 things to watch out for and 3 things to embrace.
  3. 3. What if we can master “text”? What do we get from it? There are opportunities in every vertical: 1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management
  4. 4. What if we can master “text”? What do we get from it? There are opportunities in every vertical: 1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management 2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence
  5. 5. What if we can master “text”? What do we get from it? There are opportunities in every vertical: 1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management 2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence 3. Legal and Government –-- Legal and administrative filings / Case document and administrative record management / Analysis of legal and administrative documents (land records, case files)
  6. 6. What if we can master “text”? What do we get from it? Do you observe a pattern? In every vertical … Output Text / Store and Transform Text / Ingest and Analyze Text
  7. 7. How do we unlock the value in “text”? Output Text / Store and Transform Text / Ingest and Analyze Text Natural Language Generation Natural Language Understanding Natural Language Processing (aka Text Analytics)
  8. 8. Use Case 1: Customer Service Let’s say you have some text … … and a database or spreadsheet with columns “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” … and you have to fill in the database fields from the information in the text … Reporter Location (of Reporter) Product
  9. 9. Use Case 1: Land Records Let’s say you have some text … … and a database or spreadsheet with columns “Property K45L234 (lot 23-24) in Wake County of 3000 sq ft was sold to James Fischer on 3-30-1997 …” … and you have to fill in the database fields from the information in the text …
  10. 10. Use Case 1: Land Records Let’s say you have some text … … and a database or spreadsheet with columns “Property K45L234 (lot 23-24) in Wake County of 3000 sq ft was sold to James Fischer on 3-30-1997 …” … and you have to fill in the database fields from the information in the text … Title Number Lot County
  11. 11. Use Case 1: M&A Transactions Let’s say you have some text … … and a database or spreadsheet with columns “Acme Financials, a subsidiary of Lehman Sisters, was acquired by John Doe Corp on 5/26/2001.” … and you have to fill in the database fields from the information in the text …
  12. 12. Use Case 1: M&A Transactions Let’s say you have some text … … and a database or spreadsheet with columns “Acme Financials, a subsidiary of Lehman Sisters, was acquired by John Doe Corp on 5/26/2001.” … and you have to fill in the database fields from the information in the text … Acquirer Acquired Date
  13. 13. Use Case 1: Customer Service [ Information Extraction ] Let’s say you have some text … … and a database or spreadsheet with columns “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Entities are pieces of text that could go into the fields in the database. Identifying entities and the relations between them Reporter Location (of Reporter) Product
  14. 14. Use Case 1: Customer Service [ Information Extraction ] Let’s say you have some text … … and a database or spreadsheet with columns “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Entities are pieces of text that could go into the fields in the database. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger
  15. 15. Use Case 1: Customer Service [ Information Extraction ] Relations tell you about the connections between entities. “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Entities are pieces of text that could go into the fields in the database. Relations connect the entities that belong in a row. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger Location of Reporter
  16. 16. Use Case 1: Customer Service [ Information Extraction ] “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Information extraction converts: unstructured information into structured information. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger
  17. 17. Use Case 1: Customer Service [ Information Extraction ] “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Information extraction can improve efficiencies in processes where humans read text and copy fields into databases. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger
  18. 18. Use Case 1: Customer Service [ Information Extraction ] How can text analytics methods be used to automate entity and relation extraction? Rule based methods Machine learning methods Aiaioo Labs aiaioo.com
  19. 19. Use Case 1: Customer Service [ Information Extraction ] Rule-based frameworks for entity and relation extraction? http://services.gate.ac.uk/annie/
  20. 20. Use Case 1: Customer Service [ Information Extraction ]
  21. 21. Use Case 1: Customer Service [ Information Extraction ] It uses lists of first names and last names of persons, and names of places … and matches them in the text … How does GATE/Annie identify entities and the relations? “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” “Jack” “Jill” “John” “Chambers” “Miller” “Farnsworth” “Springfield” “Boston” “Cambridge” “MA” “CA” “MD”
  22. 22. Use Case 1: Customer Service [ Information Extraction ] Machine learning frameworks for entity and relation extraction? https://opennlp.apache.org/ Apache OpenNLP
  23. 23. Use Case 1: Customer Service [ Information Extraction ] Machine learning frameworks need training data. https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
  24. 24. Use Case 1: Customer Service [ Information Extraction ] From examples such as: It learns to recognize: How does OpenNLP identify entities and the relations? “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” “<START:reporter>John Archer<END> of <START:location>Maryland<END> reported a problem with his <START:product>Figo<END>.” “<START:reporter>Vince Chambers<END> of <START:location>Denver, CO<END> had trouble with his <START:product>Focus<END>.”
  25. 25. Use Case 1: Customer Service [ Information Extraction ] How to choose between text analytics methods for entity and relation extraction? Rule based methods Machine learning methods 3 months to reasonably performing model Typically higher precision Typically less flexibility Typically less recall 1+ years to reasonably performing model Typically lower precision Typically more flexibility Typically higher recall + overall performance
  26. 26. 5’11” 5’ 8” Can you classify these door heights as: Short / Tall ? 5’8” 5’11” 6’2” 6’6” 5’ 2” 6’8” 6’9” 6’10” Aiaioo Labs aiaioo.com
  27. 27. 5’11” 5’ 8” In analytics, an analyst comes up with a rule. 5’8” 5’11” 6’2” 6’6” 5’ 2” 6’8” 6’9” 6’10” If door_height < 6’ then Short else Tall Aiaioo Labs aiaioo.com
  28. 28. 5’11” 5’ 8” In machine learning, the computer comes up with a rule from examples. 5’8” 5’11” 6’2” 6’6” 5’ 2” 6’8” 6’9” 6’10” Aiaioo Labs aiaioo.com
  29. 29. How do we unlock the value in “text”? The first use case … Output Text / Store and Transform Text / Ingest and Analyze Text Information Extraction Identifying entities and the relations between them Aiaioo Labs aiaioo.com
  30. 30. How do we unlock the value in “text”? The second use case … Output Text / Store and Transform Text / Ingest and Analyze Text Text Categorization Labeling text with one or more category labels Aiaioo Labs aiaioo.com
  31. 31. Use Case 2: Organizing Text for Storage Let’s say you have some text … … and you want to mark it as one of … “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Report Inquiry Aiaioo Labs aiaioo.com
  32. 32. Use Case 2: Organizing Text [ Text Categorization ] Start by collecting some samples of documents of each of your categories Report Inquiry I have a problem This complaint is about Where can I buy a Do you sell furniture Aiaioo Labs aiaioo.com
  33. 33. Use Case 2: Organizing Text [ Text Categorization ] Train a classifierwith them. Aiaioo Labs aiaioo.com Report Inquiry I have a problem This complaint is about Where can I buy a Do you sell furniture
  34. 34. Use Case 2: Organizing Text [ Text Categorization ] Start by collecting some samples of documents of each of your categories Politics Sports The United Nations The United States and Manchester United Manchester and Barca Aiaioo Labs aiaioo.com
  35. 35. Use Case 2: Organizing Text [ Text Categorization ] Train a classifierwith them. Politics Sports The United Nations The United States and Manchester United Manchester and Barca Aiaioo Labs aiaioo.com
  36. 36. Use Case 2: Organizing Text [ Text Categorization ] Run the classifieron a new piece of text. The classifierwill return a label. Politics Nations and States Aiaioo Labs aiaioo.com
  37. 37. Use Case 2: Organizing Text [ Text Categorization ] How can text analytics methods be used to automate organization/categorization? Rule based methods Machine learning methods Aiaioo Labs aiaioo.com
  38. 38. Use Case 2: Organizing Text [ Text Categorization ] But rule-based methods work for classification too. Rule-based text categorization is often used in: Social media sentiment classification Aiaioo Labs aiaioo.com
  39. 39. Use Case 2: Organizing Text [ Text Categorization ] We use lists of negative and positive words (usually adjectives) (available in the AFINN gazetteer) … and match them in the text … How do we use rules to identify sentiment? “I am sad that Steve Jobs died.” “sad” “bad” “evil” “distraught” “dead” “died” “thrilled” “excited” “amazed” “happy” “love” “joy” Aiaioo Labs aiaioo.com
  40. 40. Use Case 2: Organizing Text [ Text Categorization ] Can we use entity and relation extraction to do better? “I am sad that [Steve Jobs died].” Analysis: This person holds a positive opinion of Steve Jobs The –ve entity ‘sad’ is related to the –ve event ‘Steve Jobs died’. Aiaioo Labs aiaioo.com
  41. 41. Use Case 2: Organizing Text [ Text Categorization ] How to choose between text analytics methods for text categorization? Rule based methods Machine learning methods Typically higher precision Typically less flexibility Typically less recall Typically lower precision Typically more flexibility Typically higher recall + overall performance Aiaioo Labs aiaioo.com
  42. 42. How do we unlock the value in “text”? The first use case … Output Text / Store and Transform Text / Ingest and Analyze Text Information Extraction Identifying entities and the relations between them Aiaioo Labs aiaioo.com
  43. 43. How do we unlock the value in “text”? The second use case … Output Text / Store and Transform Text / Ingest and Analyze Text Text Categorization Labeling text with one or more category labels Aiaioo Labs aiaioo.com
  44. 44. How do we unlock the value in “text”? The third use case … Output Text / Store and Transform Text / Ingest and Analyze Text Question Answering Generating a response to an inquiry Aiaioo Labs aiaioo.com
  45. 45. Use Case 3: Answering Questions Let’s say you get a question … … and you want to answer to be one of … “Do you ship your cars to Boston, MA?” Yes No Aiaioo Labs aiaioo.com
  46. 46. Use Case 3: Answering Questions First you classify the question into one of 3 types… and these are… “Do you ship your cars to Boston, MA?” “Who is the CEO of Apple?” “Why is the sky blue?” Yes/No questions Factoid questions Non-factoid questions Aiaioo Labs aiaioo.com
  47. 47. Use Case 3: Answering Questions Look for answers in databases that you created using entity / relationship extraction “Do you ship your cars to Boston, MA?” “Who is the CEO of Apple?” “Why is the sky blue?” Product Ships To Cars USA CEO Firm Tim Cook Apple Aiaioo Labs aiaioo.com
  48. 48. To watch out for: Text Analytics Traps 1. Testing on Training Data 2. Using US Training Data for India 3. Treating all Data Sources as One Aiaioo Labs aiaioo.com
  49. 49. To embrace: Text Analytics Tricks 1. UI Compensation for AI Inaccuracy 2. Raising Precision at the Cost of Recall 3. Domain Specific Rules Aiaioo Labs aiaioo.com
  50. 50. About Aiaioo Labs AI Research Lab 1. http://aiaioo.com 2. http://aiaioo.com/publications 3. http://aiaioo.wordpress.com Aiaioo Labs aiaioo.com
  51. 51. THANK YOU Aiaioo Labs aiaioo.com

×