SlideShare a Scribd company logo
Deep Distillation from Text
Naveen Ashish
University of Southern California & Cognie Inc.,
March 18th 2014
This is about …..
“DEEP TEXT DISTILLATION”
The hard nut of having computers “understand” natural
language (text) ….
 Pushing the boundaries of what we can achieve ….
"It's (the problem of computers understanding natural language) ambitious ...in
fact there's no more important project than understanding intelligence and
recreating it.“ - Ray Kurzweil (2013)
Alan Turing based the Turing Test entirely on written language….To really master
natural language …that’s the key to the Turing Test–to a human requires the full
scope of human intelligence. …So the point is that natural language is a very
profound domain to do artificial intelligence in. - Ray Kurzweil (2013)
Why ….
 the problem is far from solved ….. !!!!
 unstructured data everywhere
95 % !
search
text analytics
big data analytics
health informatics
social-media intelligence
Introduction
About myself
Associate Professor (Informatics), Keck School of Medicine,
University of Southern California
Cognie Inc.,
Work leverages
Information extraction work and systems developed at UC Irvine
 XAR, UCI-PEP
Advisory consulting engagements with several companies and
start-ups
Outline
Deep distillation: What is and why
State-of-the-art
Fundamentals
Approach
Details
Expressions, Entities, Sentiment
Case studies
Retail, Health, Risk assessment
Conclusions
What is “Deep” text distillation ?
Data
Abstract
This paper describes the results of a
study investigating ….
…..
We conclude that salt and diabetes are
largely unrelated.
Deep Distillation
The abstract, not explicitly mentioned !
What falls in this category
Expressions
Contextual sentiment
Aspect classification
I think you need better chefs  SUGGESTION
The mocha is too sweet  NEGATIVE
I used to take Lipitor for … PERSONAL EXPERIENCE
The dim lights have a cozy effect …. AMBIENCE
A Common Intersection
Distill at sentence level
Aggregate to entire feedback, post, comment or
thread
Three primary elements
Expression/Intent
Entities/Aspects (and Classes)
Sentiment
Why Deeper ?
 Goal: Get actionable insights from data !
 Hypothesis: Deeper extraction  Better insights !
The top advice items advised for skin rash are aloe vera,
vitamin E oil and oatmeal
Complaints comprise 36% of the overall feedback with top
issues being slow service, drinks and coffee
Context
COGNIETM: A PLATFORM for text analytics
COGNIE TM
XAR UCI-PEP
SHIP SURVEY
ANALYTICS
RETAIL
ANALYTICS
RISK
ASSESSMENT
Expressions
Beyond entities and sentiment : EXPRESSSIONS
EXPRESSIONS
Introduced in [Ashish et al, 2011]
Expressions
You should try Vitamin E oil …  ADVICE
..I have had arthritis since 1991…  EXPERIENCE
HEALTH
..for me lipitor worked like a charm…  OUTCOME
Expressions
…showers had no hot water !…  COMPLAINT
..you should have more veggie options…  SUGGESTION
RETAIL/ENTERPRISE
..meats on special this weekend…  ANNOUNCEMENT
..this is the best store on the west side…  ADVOCACY
There is hardly any evidence to suggest a link between salt and diabetes  -
This results confirm that high intake of salt leads to increase in BP +
RISK ASSESSMENT
The Landscape
Text Analytics Spectrum
Wide offering of
 Text analytics engines
 Text analysis tools – many open-source
Largely still for “spotting things”
 entities, concepts, sentiment, topics, emotions ….
Going deeper
 Luminoso
 Attensity (Intents)
Deep Learning for Sentiment
 Stanford
 Recursive Neural Networks
Approach
Approach
natural language processing
machine learning
semantics
Architecture: COGNIE TM Platform
Segmentation
POS Tagging
Entity extraction
Anaphora
Parsing
Gram analysis
Existing (DMOZ, SNOMED,UMLS)
Creation
Declarative
Naïve-Bayes
MaxEnt
TFIDF
CRF
RNN Deep Learning
ENSEMBLE
NLP
Machine Learning
Knowledge Engineering
The Indicators: “Give Aways”
A combination of multiple types of elements !
…showers had no hot water !… COMPLAINT
(You) should have more veggie options… SUGGESTION
..i have been on lipitor… EXPERIENCE
..this is the best store on the west side… ADVOCACY
Approach: Given Indicators
NLP
Identification of individual elements
 Unsupervised
Relationships between elements
Semantics
Identification of individual elements
 Knowledge driven
Machine Learning Classification
Combine elements  classify
Natural Language Processing
 UIMA and GATE
 Stanford NLP Tools
POS tagging
 Parsing
 NE Recognizer
 Geo-tagger
 ….
Natural Language Processing
 Text Segmentation
In many cases the “unit” if distillation is a sentence
 Segmentation
 UIMA (or GATE)
 Custom
 Complex sentence segmentation
 Breakup into individual clauses
NLP
 Part-of-speech tags are key indicators
Expression distillation
 Entity extraction
Names, Locations, Organizations
 Parsing
If required
 Anaphora
NGram Analysis
Unigram and Bigram analysis
Obtain
Grams
Frequency
Entropy
Grams of tokens as well as POS Patterns
VB VBD
Before Automated Classification: Manual
Patterns
SoL: Sequences of Labels
Labels
LEX-FOODADJ
 spicy
LEX-EXCESS
 too, very
ONT-FOOD
POS-NOUN
Sequences (Patterns)
ANY LEX-EXCESS LEX-FOODADJ ANY 
POS-VB POS-MD ….
Classification: Machine Learning
 Classification tasks
Expression
(Contextual) Sentiment
Aspect category
Frameworks
Weka
Mallet
Baseline Classifiers
 Mallet and Weka
NaiveBayes
MaxEnt
CRF
 Gram-based
Uni, Bi and Trigram features
Baseline
~ 10% accuracy
Expression Classification: Features
 Features
Polar words
Punctuations
Ngrams
POS patterns
Length !
Beginning
Ontology
…
Classifiers
 Trees
Decision Tree (J48)
Functions
Logistic Regression
SVM
Sequence Tagging
CRF: Conditional Random Fields
Expression Classification: Results
Have achieved 75% precision and recall for all
expressions considered
Factors
Feature engineering
Classifier selection
Knowledge engineering
Contextual Sentiment
 (Just) polar words can be misleading !
Polar words many not be present at all !
Combination of elements
The mocha is too sweet
Wait time is over an hour
Aisles are too narrow
Service is slow
Semantics: Ontologies
 Health
 Drugs
 Conditions
 Procedures
 Symptoms
 …
Retail (Dining)
 Food/Entrees
 Service
 Ambience
 ….
Leverage Existing Knowledge Sources
Health informatics
 UMLS
 NCI Thesaurus
 SNOMED
Retail
 DMOZ
Many other
 Freebase
 Wikipedia, DBPedia
OpenData
 data.gov
Knowledge Engineering Tools
“Mini” ontology creation
API access
Freebase
BioPortal
Wrappers
DMOZ, ….
Practical Requirements
Confidence Measures
Below threshold routed to manual transcription teams
Polarity
Snippets
Open-Source Leverage
COGNIE TM : Open Source Tools
Framework
UIMA
Classification
Weka
Mallet
NLP
Stanford tools
Indexing
Lucene
Databases
MySQL, MongoDB
Knowledge Engineering
Protégé
Select Case Studies
Case Study: Health Informatics
Distillation
Case Study: Retail & Survey Analytics
Feedback
 Direct, device collected
 Social-media
Typically short, few sentences
Strong requirement for aspect classification
 [Food,Service,Ambience,Pricing,Other]
Negative : “Immediate” vs “Long Term” classification
…food was awesome, service needs improvement ….
you need to be open longer !
Case Study: Risk Assessment
 Biomedical Literature Abstracts
Correlation direction (+ -)
Subject
Article type
Features
Clauses
Negation and Triggers
Semantic Heterogeneity
Performance
MapReduce
Throughput can be an issue
Complex language processing algorithms
Large ontologies in some cases
Hadoop MapReduce
[Kahn and Ashish, 2014]
Conclusions
Conclusions
Deeper distillation from text is important
Can be achieved by
Detecting and combining multiple elements in text
 Feature engineering
 Knowledge engineering
 Classifier selection
Does not have to be perfect
Every domain, dataset has its nuances
thank you !
naveen.ashish@cognie.com

More Related Content

Viewers also liked

Colloids presentation slides
Colloids presentation slidesColloids presentation slides
Colloids presentation slides
devadevi666
 
The Fundamentals of Rheology
The Fundamentals of RheologyThe Fundamentals of Rheology
The Fundamentals of Rheology
Instron
 
Adsorption
AdsorptionAdsorption
Adsorption
Rajveer Bhaskar
 
Distillation
DistillationDistillation
Distillation
Sujeet TAMBE
 
Colloids
ColloidsColloids
Colloids
Waikar suresh
 
Rheology methods
Rheology methodsRheology methods
Rheology methods
Asabuwa N. Fahanwi
 
distillation
distillationdistillation
distillation
Govind Manglani
 
Distillation
DistillationDistillation
Distillation
Aadil22
 
Types of corrosions
Types of corrosionsTypes of corrosions
Types of corrosions
Amar Ilindra
 
distillation
distillationdistillation
distillation
guest22f059
 
Rheology
RheologyRheology
Rheology
Rajveer Bhaskar
 
Adsorption presentation
Adsorption  presentationAdsorption  presentation
Adsorption presentation
University Of Johannesburg, SA
 
Distillation Column Design
Distillation Column DesignDistillation Column Design
Distillation Column Design
EPIC Systems
 
corrosion presentation
corrosion presentationcorrosion presentation
corrosion presentation
akshaykhanna1997
 
Corrosion Sl Part Three
Corrosion Sl Part ThreeCorrosion Sl Part Three
Corrosion Sl Part Three
Steve1954
 
Corrosion.ppt
Corrosion.pptCorrosion.ppt
Corrosion.ppt
Darsh Kanjiya
 
Rheology
RheologyRheology
Rheology
Raju Sanghvi
 
Cleaning validation a complete know how
Cleaning validation a complete know howCleaning validation a complete know how
Cleaning validation a complete know how
Sambhujyoti Das
 
Principles of corrosion
Principles of corrosionPrinciples of corrosion
Principles of corrosion
Prof. T. K. G. Namboodhiri
 

Viewers also liked (19)

Colloids presentation slides
Colloids presentation slidesColloids presentation slides
Colloids presentation slides
 
The Fundamentals of Rheology
The Fundamentals of RheologyThe Fundamentals of Rheology
The Fundamentals of Rheology
 
Adsorption
AdsorptionAdsorption
Adsorption
 
Distillation
DistillationDistillation
Distillation
 
Colloids
ColloidsColloids
Colloids
 
Rheology methods
Rheology methodsRheology methods
Rheology methods
 
distillation
distillationdistillation
distillation
 
Distillation
DistillationDistillation
Distillation
 
Types of corrosions
Types of corrosionsTypes of corrosions
Types of corrosions
 
distillation
distillationdistillation
distillation
 
Rheology
RheologyRheology
Rheology
 
Adsorption presentation
Adsorption  presentationAdsorption  presentation
Adsorption presentation
 
Distillation Column Design
Distillation Column DesignDistillation Column Design
Distillation Column Design
 
corrosion presentation
corrosion presentationcorrosion presentation
corrosion presentation
 
Corrosion Sl Part Three
Corrosion Sl Part ThreeCorrosion Sl Part Three
Corrosion Sl Part Three
 
Corrosion.ppt
Corrosion.pptCorrosion.ppt
Corrosion.ppt
 
Rheology
RheologyRheology
Rheology
 
Cleaning validation a complete know how
Cleaning validation a complete know howCleaning validation a complete know how
Cleaning validation a complete know how
 
Principles of corrosion
Principles of corrosionPrinciples of corrosion
Principles of corrosion
 

Similar to Deep Distillation from Natural Language Text

Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine Reading
Naveen Ashish
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
Naveen Ashish
 
New Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing SystemsNew Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing Systems
Andrejkovics Zoltán
 
Ai4life aiml-xops-sig
Ai4life aiml-xops-sigAi4life aiml-xops-sig
Ai4life aiml-xops-sig
madhucharis
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
Meena Nagarajan
 
MIS 07 Expert Systems
MIS 07  Expert SystemsMIS 07  Expert Systems
MIS 07 Expert Systems
Tushar B Kute
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Jennifer D. Davis, Ph.D.
 
NLP(Natural Language Processing)
NLP(Natural Language Processing)NLP(Natural Language Processing)
NLP(Natural Language Processing)
Jitendra Kumar Yadav
 
NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))
Jitendra Kumar Yadav
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
iarthur
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
iarthur
 
TVOT June 2012
TVOT June 2012TVOT June 2012
TVOT June 2012
Viaccess-Orca
 
97 thingseveryprogrammershouldknow
97 thingseveryprogrammershouldknow97 thingseveryprogrammershouldknow
97 thingseveryprogrammershouldknow
REHAN KHAN
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
Safayet Hossain
 
Oman qaboos
Oman qaboosOman qaboos
Oman qaboos
Stephen Abram
 
Artificial intelligence in health care by Islam salama " Saimo#BoOm "
Artificial intelligence in health care by Islam salama " Saimo#BoOm "Artificial intelligence in health care by Islam salama " Saimo#BoOm "
Artificial intelligence in health care by Islam salama " Saimo#BoOm "
Dr-Islam Salama
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
Michel Bruley
 
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
TEST Huddle
 
"An Introduction to AI and Deep Learning"
"An Introduction to AI and Deep Learning""An Introduction to AI and Deep Learning"
"An Introduction to AI and Deep Learning"
Oswald Campesato
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
David Talby
 

Similar to Deep Distillation from Natural Language Text (20)

Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine Reading
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
 
New Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing SystemsNew Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing Systems
 
Ai4life aiml-xops-sig
Ai4life aiml-xops-sigAi4life aiml-xops-sig
Ai4life aiml-xops-sig
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
MIS 07 Expert Systems
MIS 07  Expert SystemsMIS 07  Expert Systems
MIS 07 Expert Systems
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
NLP(Natural Language Processing)
NLP(Natural Language Processing)NLP(Natural Language Processing)
NLP(Natural Language Processing)
 
NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
TVOT June 2012
TVOT June 2012TVOT June 2012
TVOT June 2012
 
97 thingseveryprogrammershouldknow
97 thingseveryprogrammershouldknow97 thingseveryprogrammershouldknow
97 thingseveryprogrammershouldknow
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
 
Oman qaboos
Oman qaboosOman qaboos
Oman qaboos
 
Artificial intelligence in health care by Islam salama " Saimo#BoOm "
Artificial intelligence in health care by Islam salama " Saimo#BoOm "Artificial intelligence in health care by Islam salama " Saimo#BoOm "
Artificial intelligence in health care by Islam salama " Saimo#BoOm "
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
Rekard Edgren - Curing Our Binary Disease - EuroSTAR 2012
 
"An Introduction to AI and Deep Learning"
"An Introduction to AI and Deep Learning""An Introduction to AI and Deep Learning"
"An Introduction to AI and Deep Learning"
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 

Recently uploaded

leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 

Recently uploaded (20)

leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 

Deep Distillation from Natural Language Text

  • 1. Deep Distillation from Text Naveen Ashish University of Southern California & Cognie Inc., March 18th 2014
  • 2. This is about ….. “DEEP TEXT DISTILLATION” The hard nut of having computers “understand” natural language (text) ….  Pushing the boundaries of what we can achieve …. "It's (the problem of computers understanding natural language) ambitious ...in fact there's no more important project than understanding intelligence and recreating it.“ - Ray Kurzweil (2013) Alan Turing based the Turing Test entirely on written language….To really master natural language …that’s the key to the Turing Test–to a human requires the full scope of human intelligence. …So the point is that natural language is a very profound domain to do artificial intelligence in. - Ray Kurzweil (2013)
  • 3. Why ….  the problem is far from solved ….. !!!!  unstructured data everywhere 95 % ! search text analytics big data analytics health informatics social-media intelligence
  • 4. Introduction About myself Associate Professor (Informatics), Keck School of Medicine, University of Southern California Cognie Inc., Work leverages Information extraction work and systems developed at UC Irvine  XAR, UCI-PEP Advisory consulting engagements with several companies and start-ups
  • 5. Outline Deep distillation: What is and why State-of-the-art Fundamentals Approach Details Expressions, Entities, Sentiment Case studies Retail, Health, Risk assessment Conclusions
  • 6. What is “Deep” text distillation ?
  • 7. Data Abstract This paper describes the results of a study investigating …. ….. We conclude that salt and diabetes are largely unrelated.
  • 8. Deep Distillation The abstract, not explicitly mentioned ! What falls in this category Expressions Contextual sentiment Aspect classification I think you need better chefs  SUGGESTION The mocha is too sweet  NEGATIVE I used to take Lipitor for … PERSONAL EXPERIENCE The dim lights have a cozy effect …. AMBIENCE
  • 9. A Common Intersection Distill at sentence level Aggregate to entire feedback, post, comment or thread Three primary elements Expression/Intent Entities/Aspects (and Classes) Sentiment
  • 10. Why Deeper ?  Goal: Get actionable insights from data !  Hypothesis: Deeper extraction  Better insights ! The top advice items advised for skin rash are aloe vera, vitamin E oil and oatmeal Complaints comprise 36% of the overall feedback with top issues being slow service, drinks and coffee
  • 11. Context COGNIETM: A PLATFORM for text analytics COGNIE TM XAR UCI-PEP SHIP SURVEY ANALYTICS RETAIL ANALYTICS RISK ASSESSMENT
  • 12. Expressions Beyond entities and sentiment : EXPRESSSIONS EXPRESSIONS Introduced in [Ashish et al, 2011]
  • 13. Expressions You should try Vitamin E oil …  ADVICE ..I have had arthritis since 1991…  EXPERIENCE HEALTH ..for me lipitor worked like a charm…  OUTCOME
  • 14. Expressions …showers had no hot water !…  COMPLAINT ..you should have more veggie options…  SUGGESTION RETAIL/ENTERPRISE ..meats on special this weekend…  ANNOUNCEMENT ..this is the best store on the west side…  ADVOCACY There is hardly any evidence to suggest a link between salt and diabetes  - This results confirm that high intake of salt leads to increase in BP + RISK ASSESSMENT
  • 16. Text Analytics Spectrum Wide offering of  Text analytics engines  Text analysis tools – many open-source Largely still for “spotting things”  entities, concepts, sentiment, topics, emotions …. Going deeper  Luminoso  Attensity (Intents) Deep Learning for Sentiment  Stanford  Recursive Neural Networks
  • 19. Architecture: COGNIE TM Platform Segmentation POS Tagging Entity extraction Anaphora Parsing Gram analysis Existing (DMOZ, SNOMED,UMLS) Creation Declarative Naïve-Bayes MaxEnt TFIDF CRF RNN Deep Learning ENSEMBLE NLP Machine Learning Knowledge Engineering
  • 20. The Indicators: “Give Aways” A combination of multiple types of elements ! …showers had no hot water !… COMPLAINT (You) should have more veggie options… SUGGESTION ..i have been on lipitor… EXPERIENCE ..this is the best store on the west side… ADVOCACY
  • 21. Approach: Given Indicators NLP Identification of individual elements  Unsupervised Relationships between elements Semantics Identification of individual elements  Knowledge driven Machine Learning Classification Combine elements  classify
  • 22. Natural Language Processing  UIMA and GATE  Stanford NLP Tools POS tagging  Parsing  NE Recognizer  Geo-tagger  ….
  • 23. Natural Language Processing  Text Segmentation In many cases the “unit” if distillation is a sentence  Segmentation  UIMA (or GATE)  Custom  Complex sentence segmentation  Breakup into individual clauses
  • 24. NLP  Part-of-speech tags are key indicators Expression distillation  Entity extraction Names, Locations, Organizations  Parsing If required  Anaphora
  • 25. NGram Analysis Unigram and Bigram analysis Obtain Grams Frequency Entropy Grams of tokens as well as POS Patterns VB VBD
  • 26. Before Automated Classification: Manual Patterns SoL: Sequences of Labels Labels LEX-FOODADJ  spicy LEX-EXCESS  too, very ONT-FOOD POS-NOUN Sequences (Patterns) ANY LEX-EXCESS LEX-FOODADJ ANY  POS-VB POS-MD ….
  • 27. Classification: Machine Learning  Classification tasks Expression (Contextual) Sentiment Aspect category Frameworks Weka Mallet
  • 28. Baseline Classifiers  Mallet and Weka NaiveBayes MaxEnt CRF  Gram-based Uni, Bi and Trigram features Baseline ~ 10% accuracy
  • 29. Expression Classification: Features  Features Polar words Punctuations Ngrams POS patterns Length ! Beginning Ontology …
  • 30. Classifiers  Trees Decision Tree (J48) Functions Logistic Regression SVM Sequence Tagging CRF: Conditional Random Fields
  • 31. Expression Classification: Results Have achieved 75% precision and recall for all expressions considered Factors Feature engineering Classifier selection Knowledge engineering
  • 32. Contextual Sentiment  (Just) polar words can be misleading ! Polar words many not be present at all ! Combination of elements The mocha is too sweet Wait time is over an hour Aisles are too narrow Service is slow
  • 33. Semantics: Ontologies  Health  Drugs  Conditions  Procedures  Symptoms  … Retail (Dining)  Food/Entrees  Service  Ambience  ….
  • 34. Leverage Existing Knowledge Sources Health informatics  UMLS  NCI Thesaurus  SNOMED Retail  DMOZ Many other  Freebase  Wikipedia, DBPedia OpenData  data.gov
  • 35. Knowledge Engineering Tools “Mini” ontology creation API access Freebase BioPortal Wrappers DMOZ, ….
  • 36. Practical Requirements Confidence Measures Below threshold routed to manual transcription teams Polarity Snippets
  • 38. COGNIE TM : Open Source Tools Framework UIMA Classification Weka Mallet NLP Stanford tools Indexing Lucene Databases MySQL, MongoDB Knowledge Engineering Protégé
  • 40. Case Study: Health Informatics
  • 42. Case Study: Retail & Survey Analytics Feedback  Direct, device collected  Social-media Typically short, few sentences Strong requirement for aspect classification  [Food,Service,Ambience,Pricing,Other] Negative : “Immediate” vs “Long Term” classification …food was awesome, service needs improvement …. you need to be open longer !
  • 43. Case Study: Risk Assessment  Biomedical Literature Abstracts Correlation direction (+ -) Subject Article type Features Clauses Negation and Triggers Semantic Heterogeneity
  • 45. MapReduce Throughput can be an issue Complex language processing algorithms Large ontologies in some cases Hadoop MapReduce [Kahn and Ashish, 2014]
  • 47. Conclusions Deeper distillation from text is important Can be achieved by Detecting and combining multiple elements in text  Feature engineering  Knowledge engineering  Classifier selection Does not have to be perfect Every domain, dataset has its nuances