SlideShare a Scribd company logo
1 of 68
Download to read offline
Domain Cartridge: Unsupervised
Framework for Shallow Domain Ontology
Construction from Corpus
Subhabrata Mukherjee
Jitendra Ajmera, Sachindra Joshi
Max Planck Institute for Informatics
IBM India Research Lab
CIKM 2014
October 9, 2014
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Motivation: Domain Term Discovery
Usefulness for Parsing. Consider the examples:
“install sprint navigation”
“problem with touch screen of the mobile”
ESG Parser generates noisy or incomplete parse without the
smartphone domain knowledge that “sprint navigation, touch
screen” are multi-word concepts
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Motivation: Domain Term Discovery
Usefulness for Parsing. Consider the examples:
“install sprint navigation”
“problem with touch screen of the mobile”
ESG Parser generates noisy or incomplete parse without the
smartphone domain knowledge that “sprint navigation, touch
screen” are multi-word concepts
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Motivation: Domain Relation Discovery
Interactive dialogue systems
For user query “battery of my device depletes fast", the
knowledge ‘battery’ is a Feature-Of ‘device’ enables the system
to clarify about the type of device
Query expansion
E.g. Consider Synonyms along with the original query, ‘battery’
is a Feature-Of ‘phone’ as well as ‘tablet’ ‘device’
Query re-formulation
For user query “screen freezes E5150", the knowledge ‘E5150’
is a Type-Of ‘Error’ results in query re-formulation “screen
freezes error E5150"
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Motivation: Domain Relation Discovery
Interactive dialogue systems
For user query “battery of my device depletes fast", the
knowledge ‘battery’ is a Feature-Of ‘device’ enables the system
to clarify about the type of device
Query expansion
E.g. Consider Synonyms along with the original query, ‘battery’
is a Feature-Of ‘phone’ as well as ‘tablet’ ‘device’
Query re-formulation
For user query “screen freezes E5150", the knowledge ‘E5150’
is a Type-Of ‘Error’ results in query re-formulation “screen
freezes error E5150"
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Motivation: Domain Relation Discovery
Interactive dialogue systems
For user query “battery of my device depletes fast", the
knowledge ‘battery’ is a Feature-Of ‘device’ enables the system
to clarify about the type of device
Query expansion
E.g. Consider Synonyms along with the original query, ‘battery’
is a Feature-Of ‘phone’ as well as ‘tablet’ ‘device’
Query re-formulation
For user query “screen freezes E5150", the knowledge ‘E5150’
is a Type-Of ‘Error’ results in query re-formulation “screen
freezes error E5150"
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Unsupervised Framework
Typically for a domain, a lot of knowledge articles, manuals,
tutorials etc. are available in a variety of formats
Most of these documents have less hyperlink and table
(info-box as in Wikipedia) information
Challenge is to learn a shallow ontology from raw unannotated
plain text
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Unsupervised Framework
Typically for a domain, a lot of knowledge articles, manuals,
tutorials etc. are available in a variety of formats
Most of these documents have less hyperlink and table
(info-box as in Wikipedia) information
Challenge is to learn a shallow ontology from raw unannotated
plain text
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Unsupervised Framework
Typically for a domain, a lot of knowledge articles, manuals,
tutorials etc. are available in a variety of formats
Most of these documents have less hyperlink and table
(info-box as in Wikipedia) information
Challenge is to learn a shallow ontology from raw unannotated
plain text
Domain Cartridge as a Graph
install insert
device
handset
android
blackberry
Operating
system
samsung
sim
card
Samsung
Galaxy
victory
Samsung
array
Domain
term
Domain
process
Domain Cartridge as a Graph
install insert
device
handset
android
blackberry
Operating
system
samsung
sim
card
Samsung
Galaxy
victory
Samsung
array
Domain
term
Domain
process
Synonyms
Domain Cartridge as a Graph
install insert
device
handset
android
blackberry
Operating
system
samsung
sim
card
Samsung
Galaxy
victory
Samsung
array
Domain
term
Domain
process
Feature-Of
Domain Cartridge as a Graph
install insert
device
handset
android
blackberry
Operating
system
samsung
sim
card
Samsung
Galaxy
victory
Samsung
array
Domain
term
Domain
process
Type-Of
Domain Cartridge as a Graph
install insert
device
handset
android
blackberry
Operating
system
samsung
sim
card
Samsung
Galaxy
victory
Samsung
array
Domain
term
Domain
process
Action-On
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Roadmap
Unsupervised framework for shallow domain ontology
construction:
Domain Term Discovery (DTD)
Improvement of Parser performance by DTD
Domain Relation Discovery (DRD)
Use-Case: Improvement of an in-house Question-Answering
system
Experiments: Manual Evaluation, Comparison with BabelNet,
WordNet, Yago
Conclusions
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Roadmap
Unsupervised framework for shallow domain ontology
construction:
Domain Term Discovery (DTD)
Improvement of Parser performance by DTD
Domain Relation Discovery (DRD)
Use-Case: Improvement of an in-house Question-Answering
system
Experiments: Manual Evaluation, Comparison with BabelNet,
WordNet, Yago
Conclusions
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Roadmap
Unsupervised framework for shallow domain ontology
construction:
Domain Term Discovery (DTD)
Improvement of Parser performance by DTD
Domain Relation Discovery (DRD)
Use-Case: Improvement of an in-house Question-Answering
system
Experiments: Manual Evaluation, Comparison with BabelNet,
WordNet, Yago
Conclusions
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Roadmap
Unsupervised framework for shallow domain ontology
construction:
Domain Term Discovery (DTD)
Improvement of Parser performance by DTD
Domain Relation Discovery (DRD)
Use-Case: Improvement of an in-house Question-Answering
system
Experiments: Manual Evaluation, Comparison with BabelNet,
WordNet, Yago
Conclusions
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Corpus: Knowledge articles, manuals, tutorials etc.
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Parsing
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Parsing
English Slot Grammar (ESG) parser is used
50 - 100 times faster than the Charniak parser
Shallow semantic relationship (SSR) annotation done over ESG
parser output to generate normalized parser relations
E.g., “Samsung has a battery" and “Samsung’s battery died" both
generate the same relation ‘nnMod:samsung_battery’
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Lucene Index
For efcient retrieval of relations, documents, positional (offset)
information, handling proximity based queries or answer scorers
E.g: (Doc. Freq. of) All relations containing “install” or
“rel:svo:quickbook_update_file”
TokenStream modied to produce contiguous representation of
all types of concepts (ngrams, relations, words etc.)
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Term Discovery
ESG parser maintains a domain term lexicon of multi-word
concepts. E.g. “touch screen, sprint navigation”
Noun Phrase Chunking done on document titles to extract
frequently occuring concepts as domain words
Enrich lexicon and bootstrap parser
Parser generates rened output
High precision but low recall — as titles are precise, clean but
short
To extract more ne-grained domain terms HITS is used on
parser output
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Term Discovery
ESG parser maintains a domain term lexicon of multi-word
concepts. E.g. “touch screen, sprint navigation”
Noun Phrase Chunking done on document titles to extract
frequently occuring concepts as domain words
Enrich lexicon and bootstrap parser
Parser generates rened output
High precision but low recall — as titles are precise, clean but
short
To extract more ne-grained domain terms HITS is used on
parser output
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Term Discovery
ESG parser maintains a domain term lexicon of multi-word
concepts. E.g. “touch screen, sprint navigation”
Noun Phrase Chunking done on document titles to extract
frequently occuring concepts as domain words
Enrich lexicon and bootstrap parser
Parser generates rened output
High precision but low recall — as titles are precise, clean but
short
To extract more ne-grained domain terms HITS is used on
parser output
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
HITS
Any Shallow Semantic Relation (SSR) from ESG parser can be
visualized as a hub generating domain terms
Any domain term can be visualized as an authority influenced
by incoming features from hubs
Link strength between the hub and authority is taken as the
number of mutual participating SSR in the Lucene Index
Good authorities are incorporated in Parser Domain Term
Lexicon
Parser is re-run, rened relations generated and previous steps
iterated till convergence
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
HITS
Any Shallow Semantic Relation (SSR) from ESG parser can be
visualized as a hub generating domain terms
Any domain term can be visualized as an authority influenced
by incoming features from hubs
Link strength between the hub and authority is taken as the
number of mutual participating SSR in the Lucene Index
Good authorities are incorporated in Parser Domain Term
Lexicon
Parser is re-run, rened relations generated and previous steps
iterated till convergence
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
HITS
Any Shallow Semantic Relation (SSR) from ESG parser can be
visualized as a hub generating domain terms
Any domain term can be visualized as an authority influenced
by incoming features from hubs
Link strength between the hub and authority is taken as the
number of mutual participating SSR in the Lucene Index
Good authorities are incorporated in Parser Domain Term
Lexicon
Parser is re-run, rened relations generated and previous steps
iterated till convergence
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
HITS
Any Shallow Semantic Relation (SSR) from ESG parser can be
visualized as a hub generating domain terms
Any domain term can be visualized as an authority influenced
by incoming features from hubs
Link strength between the hub and authority is taken as the
number of mutual participating SSR in the Lucene Index
Good authorities are incorporated in Parser Domain Term
Lexicon
Parser is re-run, rened relations generated and previous steps
iterated till convergence
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
HITS
Any Shallow Semantic Relation (SSR) from ESG parser can be
visualized as a hub generating domain terms
Any domain term can be visualized as an authority influenced
by incoming features from hubs
Link strength between the hub and authority is taken as the
number of mutual participating SSR in the Lucene Index
Good authorities are incorporated in Parser Domain Term
Lexicon
Parser is re-run, rened relations generated and previous steps
iterated till convergence
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Feedback
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Parser Performance Improvement
Number of incomplete parses went down by 73% after
incorporating domain terms in the parser lexicon
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Terms
samsung blackberry software-version application htc-evo wi-
 memory-card bluetooth motorola kyocera browser voicemail
microsoft-exchange lg-optimus samsung-m400 samsung-galaxy-
victory software-updates samsung-array text-messaging wallpa-
per synchronization iphone touch-screen blackberry-bold
Table: Snapshot of multi-word domain terms extracted by NP Chunking.
wi- optimus-g set-up novatel-wireless e-mail sierra-wireless
apple-id google-maps play-music mobile-network 10-digit
internet-explorer slacker-radio caller-id google-search address-
book my-computer software-update blackberry-id as-well-as
windows-update terms-of-service drop-down pro-700 add-on
scp-2700 mac-os device-manager voice-mail non-camera
Table: Snapshot of multi-word domain terms extracted by HITS (not
found by NP Chunking).
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Random Index (RI)
Random Indexing is used for computing word-relatedness and
dimensionality reduction
RI considers “term X term” co-occurrence, as opposed to
“term X document” matrix — allowing for incremental learning of
context information, scaling up with the corpus size
We adopt the notion of Relational Distributional Similarity. Two
terms are considered similar if they appear in a similar context
participating in similar Shallow Semantic Relations
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Random Index (RI)
Random Indexing is used for computing word-relatedness and
dimensionality reduction
RI considers “term X term” co-occurrence, as opposed to
“term X document” matrix — allowing for incremental learning of
context information, scaling up with the corpus size
We adopt the notion of Relational Distributional Similarity. Two
terms are considered similar if they appear in a similar context
participating in similar Shallow Semantic Relations
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Random Index
Instead of taking the raw neighborhood of N terms, we use only
those neighboring terms that share a syntactic dependency
(given by ESG parser) with the target term
Lucene Index is queried for term and relation co-occurrence
statistics while constructing high dimensional random index
vector for each term
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Synonym Discovery
Random Index is used to get top N similar terms for a given term
based on Relational Distributional Similarity
HITS is used to get the dominant domain terms and domain
(SSR) relations
Sim(wi, wj) =
p Ili =lj ,ki =kj
(fwki
,p, fwkj
,p )
p r Ili =lr ,ki =kr (fwki
,p, fwkr ,p )
Numerator counts the number of times any word appears in both
feature vectors with similar dominant SSR relations
Denominator counts the number of times it appears in the feature
vector of any other word with that SSR relation
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Synonym Discovery
Random Index is used to get top N similar terms for a given term
based on Relational Distributional Similarity
HITS is used to get the dominant domain terms and domain
(SSR) relations
Sim(wi, wj) =
p Ili =lj ,ki =kj
(fwki
,p, fwkj
,p )
p r Ili =lr ,ki =kr (fwki
,p, fwkr ,p )
Numerator counts the number of times any word appears in both
feature vectors with similar dominant SSR relations
Denominator counts the number of times it appears in the feature
vector of any other word with that SSR relation
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Synonym Discovery
High recall but low precision. E.g. ‘folder’ and ‘tab’ have a high
relational distributional similarity due to overlapping context
words like “open, close, minimize, shortcut, multiple"
ESG parser attributes like noun, animated, physical object,
device are used to increase precision
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Cartridge: Framework
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Action-On Discovery
Action-On represents any activity or ‘method’ on a domain term
E.g. “rel:svo:tap_add_account, rel:svo:phone_access_internet,
rel:svo:mobile_sync_phone, rel:svo:account_use_phone etc."
HITS index is traversed to extract all the dominant dm (action on
entities), and verb-object from svo (subject-verb-object) SSR
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Action-On Discovery
Action-On represents any activity or ‘method’ on a domain term
E.g. “rel:svo:tap_add_account, rel:svo:phone_access_internet,
rel:svo:mobile_sync_phone, rel:svo:account_use_phone etc."
HITS index is traversed to extract all the dominant dm (action on
entities), and verb-object from svo (subject-verb-object) SSR
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Type-Of Discovery
Type-Of relations depict Is-A hierarchy i.e. a parent-child relation
E.g. “rel:svo:devices_include_HTC, rel:npo:applications_
such-as_WhatsApp, rel:npo:features_like_call,
rel:npo:contact_such-as_address".
svo like “include” and npo “like, such-as, as” etc. are most
informative
Dominant domain terms from HITS, bearing Hearst patterns like
“or”, “especially”
E.g. service_or_process, prev_or_next,
messages_especially_sms_and_ mms
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Type-Of Discovery
Type-Of relations depict Is-A hierarchy i.e. a parent-child relation
E.g. “rel:svo:devices_include_HTC, rel:npo:applications_
such-as_WhatsApp, rel:npo:features_like_call,
rel:npo:contact_such-as_address".
svo like “include” and npo “like, such-as, as” etc. are most
informative
Dominant domain terms from HITS, bearing Hearst patterns like
“or”, “especially”
E.g. service_or_process, prev_or_next,
messages_especially_sms_and_ mms
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Feature-Of Discovery
Feature-Of relations depict components or functionalities of a
domain term
nnMod depicting noun-noun modications, and subject-object
(so) extracted from svo SSR used to discover Feature-Of from
HITS index. E.g.
“rel:nnMod:network_life, rel:nnMod:account_settings,
rel:nnMod:iPhone_battery etc."
“rel:svo:motorola-photon-4g_run_device-
software, rel:svo:router_decrease_signal-strength,
rel:svo:nd-a-store_open_locator etc."
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Feature-Of Discovery
Feature-Of relations depict components or functionalities of a
domain term
nnMod depicting noun-noun modications, and subject-object
(so) extracted from svo SSR used to discover Feature-Of from
HITS index. E.g.
“rel:nnMod:network_life, rel:nnMod:account_settings,
rel:nnMod:iPhone_battery etc."
“rel:svo:motorola-photon-4g_run_device-
software, rel:svo:router_decrease_signal-strength,
rel:svo:nd-a-store_open_locator etc."
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Smartphone Ontology Snapshot
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Term Evaluation
5000 articles, tutorials and manuals from the smartphone domain
We used the Back-of-the-Book Index (BOI) of manuals, to create
ground truth for domain term discovery
Baselines:
WordNet (G. A. Miller. Wordnet: A lexical database for english. COMMUNICATIONS OF THE ACM, 38,
1995.)
BabelNet (R. Navigli and S. P. Ponzetto. BabelNet: Building a very large multilingual semantic network. ACL
’10.)
Yago (F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. WWW ’07.)
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Term Evaluation
Method Recall
WordNet 22.62%
NP Chunking on Titles 32.45%
HITS 40.87%
Yago 43.77%
BabelNet 53.74%
Table: Domain term evaluation.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Recall of a Question-Answering System
Recall@N With Domain
Term Lexicon
Without domain
term lexicon
recall@1 0.40 0.33
recall@2 0.49 0.45
Table: Performance of a QA system with and without domain term
lexicon.
Incorporation of domain terms in parser lexicon improves QA
system performance
1
D. Gondek et al. A framework for merging and ranking of answers in
DeepQA. IBM Journal of Research and Development, 56(3), 2012.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Domain Relation Evaluation
2000 word pairs (500 for each of four categories) are manually
annotated by two annotators
High Kappa score of 0.92 for Synonyms and 0.70 for Type-Of due
to the large number of word pairs, 84% and 62%, the annotators
agree are not Synonyms and Type-Of respectively
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Comparison
Relations in BabelNet come either from (unlabeled) Wikipedia
hyperlinks or WordNet
Yago uses features like “wikipedia:redirect” links to detect
synonymous contexts
WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and
“Meronyms, Holonyms” are subset of Feature-Of
BabelNet, WordNet, Yago do not have any category like
Action-On
Yago has Type-Of categories derived from WordNet hyponymy
and hypernymy, and Wikipedia categories
System Type-Of Feature-Of Action-On
BabelNet, WordNet 19.27% - -
Yago 25.12% - -
Domain Cartridge 77% 85.7% 68%
Table: Recall comparison of systems for 3 relations.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Comparison
Relations in BabelNet come either from (unlabeled) Wikipedia
hyperlinks or WordNet
Yago uses features like “wikipedia:redirect” links to detect
synonymous contexts
WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and
“Meronyms, Holonyms” are subset of Feature-Of
BabelNet, WordNet, Yago do not have any category like
Action-On
Yago has Type-Of categories derived from WordNet hyponymy
and hypernymy, and Wikipedia categories
System Type-Of Feature-Of Action-On
BabelNet, WordNet 19.27% - -
Yago 25.12% - -
Domain Cartridge 77% 85.7% 68%
Table: Recall comparison of systems for 3 relations.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Comparison
Relations in BabelNet come either from (unlabeled) Wikipedia
hyperlinks or WordNet
Yago uses features like “wikipedia:redirect” links to detect
synonymous contexts
WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and
“Meronyms, Holonyms” are subset of Feature-Of
BabelNet, WordNet, Yago do not have any category like
Action-On
Yago has Type-Of categories derived from WordNet hyponymy
and hypernymy, and Wikipedia categories
System Type-Of Feature-Of Action-On
BabelNet, WordNet 19.27% - -
Yago 25.12% - -
Domain Cartridge 77% 85.7% 68%
Table: Recall comparison of systems for 3 relations.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Comparison
Relations in BabelNet come either from (unlabeled) Wikipedia
hyperlinks or WordNet
Yago uses features like “wikipedia:redirect” links to detect
synonymous contexts
WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and
“Meronyms, Holonyms” are subset of Feature-Of
BabelNet, WordNet, Yago do not have any category like
Action-On
Yago has Type-Of categories derived from WordNet hyponymy
and hypernymy, and Wikipedia categories
System Type-Of Feature-Of Action-On
BabelNet, WordNet 19.27% - -
Yago 25.12% - -
Domain Cartridge 77% 85.7% 68%
Table: Recall comparison of systems for 3 relations.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Comparison
Relations in BabelNet come either from (unlabeled) Wikipedia
hyperlinks or WordNet
Yago uses features like “wikipedia:redirect” links to detect
synonymous contexts
WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and
“Meronyms, Holonyms” are subset of Feature-Of
BabelNet, WordNet, Yago do not have any category like
Action-On
Yago has Type-Of categories derived from WordNet hyponymy
and hypernymy, and Wikipedia categories
System Type-Of Feature-Of Action-On
BabelNet, WordNet 19.27% - -
Yago 25.12% - -
Domain Cartridge 77% 85.7% 68%
Table: Recall comparison of systems for 3 relations.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Distributional Similarity Comparison
System Precision Recall F-Score
Yago 38% 32% 34.37%
BabelNet, WordNet 83% 31% 45.14%
Domain Cartridge (DC) 58% 41% 47.60%
DC + WordNet 62% 40% 49.00%
DC + ESG Parser Features 65% 39% 49.14%
Table: Precision-Recall comparison of Domain Cartridge
(random-indexing, HITS and sim. eqn.) with other systems.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Comparison with Distributional Similarity
Measures in WordNet
WordNet F-Score
LCH 0.22
RES 0.31
JCN 0.42
PATH 0.42
LIN 0.43
WUP 0.43
LESK 0.45
Domain Cartridge 0.49
Table: F-Score comparison of WordNet similarity measures with
Domain Cartridge.
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Conclusions
Unsupervised framework for shallow domain ontology
construction, without using manually annotated resources
Multi-words form an important component of Domain Term
Discovery
Incorporation of domain terms in parser lexicon results in 73%
reduction in incomplete parses, improving performance of an
in-house QA system by upto 7%
Synonym discovery approach, using Relational Distributional
Similarity, RI, HITS etc., performs better than other existing
approaches
Future Work: Measure customization time required for domain
adaptation of the QA system
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Conclusions
Unsupervised framework for shallow domain ontology
construction, without using manually annotated resources
Multi-words form an important component of Domain Term
Discovery
Incorporation of domain terms in parser lexicon results in 73%
reduction in incomplete parses, improving performance of an
in-house QA system by upto 7%
Synonym discovery approach, using Relational Distributional
Similarity, RI, HITS etc., performs better than other existing
approaches
Future Work: Measure customization time required for domain
adaptation of the QA system
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Conclusions
Unsupervised framework for shallow domain ontology
construction, without using manually annotated resources
Multi-words form an important component of Domain Term
Discovery
Incorporation of domain terms in parser lexicon results in 73%
reduction in incomplete parses, improving performance of an
in-house QA system by upto 7%
Synonym discovery approach, using Relational Distributional
Similarity, RI, HITS etc., performs better than other existing
approaches
Future Work: Measure customization time required for domain
adaptation of the QA system
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Conclusions
Unsupervised framework for shallow domain ontology
construction, without using manually annotated resources
Multi-words form an important component of Domain Term
Discovery
Incorporation of domain terms in parser lexicon results in 73%
reduction in incomplete parses, improving performance of an
in-house QA system by upto 7%
Synonym discovery approach, using Relational Distributional
Similarity, RI, HITS etc., performs better than other existing
approaches
Future Work: Measure customization time required for domain
adaptation of the QA system
Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
Conclusions
Unsupervised framework for shallow domain ontology
construction, without using manually annotated resources
Multi-words form an important component of Domain Term
Discovery
Incorporation of domain terms in parser lexicon results in 73%
reduction in incomplete parses, improving performance of an
in-house QA system by upto 7%
Synonym discovery approach, using Relational Distributional
Similarity, RI, HITS etc., performs better than other existing
approaches
Future Work: Measure customization time required for domain
adaptation of the QA system

More Related Content

Viewers also liked

1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...
1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...
1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...Valley Bible Fellowship
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesSubhabrata Mukherjee
 
Bienestar Con Diabetes
Bienestar Con DiabetesBienestar Con Diabetes
Bienestar Con DiabetesAdrian Barrientos
 
電子商務:中國電子商務排名
電子商務:中國電子商務排名電子商務:中國電子商務排名
電子商務:中國電子商務排名志堅 汪
 
La ContaminaciĂłn de los OcĂŠanos
La ContaminaciĂłn de los OcĂŠanosLa ContaminaciĂłn de los OcĂŠanos
La ContaminaciĂłn de los OcĂŠanosiridian_chino1
 

Viewers also liked (6)

1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...
1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...
1 Corinthians 5, Christian To Satan, OSAS, He Is Able, Passover, 3 Days And 3...
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News Communities
 
Bienestar Con Diabetes
Bienestar Con DiabetesBienestar Con Diabetes
Bienestar Con Diabetes
 
電子商務:中國電子商務排名
電子商務:中國電子商務排名電子商務:中國電子商務排名
電子商務:中國電子商務排名
 
La ContaminaciĂłn de los OcĂŠanos
La ContaminaciĂłn de los OcĂŠanosLa ContaminaciĂłn de los OcĂŠanos
La ContaminaciĂłn de los OcĂŠanos
 
Yarumal
YarumalYarumal
Yarumal
 

Similar to Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus

Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSam Ramji
 
Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceApigee | Google Cloud
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Deep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdfDeep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdfShaikAsif83
 
AWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdfAWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdfAyyanar Jeyakrishnan
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOSSam Rijs
 
Desktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
Desktop, Embedded and Mobile Apps with PrismTech Vortex CafeDesktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
Desktop, Embedded and Mobile Apps with PrismTech Vortex CafeADLINK Technology IoT
 
Desktop, Embedded and Mobile Apps with Vortex CafĂŠ
Desktop, Embedded and Mobile Apps with Vortex CafĂŠDesktop, Embedded and Mobile Apps with Vortex CafĂŠ
Desktop, Embedded and Mobile Apps with Vortex CafĂŠAngelo Corsaro
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...Andrew Liu
 
Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5vinutharani1995
 
I/O & virtualization performance with a search engine based on an xml databa...
 I/O & virtualization performance with a search engine based on an xml databa... I/O & virtualization performance with a search engine based on an xml databa...
I/O & virtualization performance with a search engine based on an xml databa...lucenerevolution
 
Intro to mobile development with sencha touch
Intro to mobile development with sencha touchIntro to mobile development with sencha touch
Intro to mobile development with sencha touchjgarifuna
 

Similar to Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus (20)

Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile Performance
 
Skeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile PerformanceSkeuomorphs, Databases, and Mobile Performance
Skeuomorphs, Databases, and Mobile Performance
 
Text mining and Visualizations
Text mining  and VisualizationsText mining  and Visualizations
Text mining and Visualizations
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Deep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdfDeep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdf
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
 
AWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdfAWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdf
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOS
 
Demo day
Demo dayDemo day
Demo day
 
Desktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
Desktop, Embedded and Mobile Apps with PrismTech Vortex CafeDesktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
Desktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
 
Desktop, Embedded and Mobile Apps with Vortex CafĂŠ
Desktop, Embedded and Mobile Apps with Vortex CafĂŠDesktop, Embedded and Mobile Apps with Vortex CafĂŠ
Desktop, Embedded and Mobile Apps with Vortex CafĂŠ
 
I explore
I exploreI explore
I explore
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
 
Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5Samsung voice intelligence.v5.5
Samsung voice intelligence.v5.5
 
I/O & virtualization performance with a search engine based on an xml databa...
 I/O & virtualization performance with a search engine based on an xml databa... I/O & virtualization performance with a search engine based on an xml databa...
I/O & virtualization performance with a search engine based on an xml databa...
 
ISAC-Projects
ISAC-ProjectsISAC-Projects
ISAC-Projects
 
iphone presentation
iphone presentationiphone presentation
iphone presentation
 
Intro to mobile development with sencha touch
Intro to mobile development with sencha touchIntro to mobile development with sencha touch
Intro to mobile development with sencha touch
 

More from Subhabrata Mukherjee

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language ModelSubhabrata Mukherjee
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesSubhabrata Mukherjee
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsSubhabrata Mukherjee
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Subhabrata Mukherjee
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelSubhabrata Mukherjee
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Subhabrata Mukherjee
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilaritySubhabrata Mukherjee
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...Subhabrata Mukherjee
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviewsSubhabrata Mukherjee
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 

More from Subhabrata Mukherjee (17)

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
 
Fact Checking from Text
Fact Checking from TextFact Checking from Text
Fact Checking from Text
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product Profiles
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language Model
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review Communities
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health Forums
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic Model
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Recently uploaded (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus

  • 1. Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus Subhabrata Mukherjee Jitendra Ajmera, Sachindra Joshi Max Planck Institute for Informatics IBM India Research Lab CIKM 2014 October 9, 2014
  • 2. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Motivation: Domain Term Discovery Usefulness for Parsing. Consider the examples: “install sprint navigation” “problem with touch screen of the mobile” ESG Parser generates noisy or incomplete parse without the smartphone domain knowledge that “sprint navigation, touch screen” are multi-word concepts
  • 3. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Motivation: Domain Term Discovery Usefulness for Parsing. Consider the examples: “install sprint navigation” “problem with touch screen of the mobile” ESG Parser generates noisy or incomplete parse without the smartphone domain knowledge that “sprint navigation, touch screen” are multi-word concepts
  • 4. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Motivation: Domain Relation Discovery Interactive dialogue systems For user query “battery of my device depletes fast", the knowledge ‘battery’ is a Feature-Of ‘device’ enables the system to clarify about the type of device Query expansion E.g. Consider Synonyms along with the original query, ‘battery’ is a Feature-Of ‘phone’ as well as ‘tablet’ ‘device’ Query re-formulation For user query “screen freezes E5150", the knowledge ‘E5150’ is a Type-Of ‘Error’ results in query re-formulation “screen freezes error E5150"
  • 5. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Motivation: Domain Relation Discovery Interactive dialogue systems For user query “battery of my device depletes fast", the knowledge ‘battery’ is a Feature-Of ‘device’ enables the system to clarify about the type of device Query expansion E.g. Consider Synonyms along with the original query, ‘battery’ is a Feature-Of ‘phone’ as well as ‘tablet’ ‘device’ Query re-formulation For user query “screen freezes E5150", the knowledge ‘E5150’ is a Type-Of ‘Error’ results in query re-formulation “screen freezes error E5150"
  • 6. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Motivation: Domain Relation Discovery Interactive dialogue systems For user query “battery of my device depletes fast", the knowledge ‘battery’ is a Feature-Of ‘device’ enables the system to clarify about the type of device Query expansion E.g. Consider Synonyms along with the original query, ‘battery’ is a Feature-Of ‘phone’ as well as ‘tablet’ ‘device’ Query re-formulation For user query “screen freezes E5150", the knowledge ‘E5150’ is a Type-Of ‘Error’ results in query re-formulation “screen freezes error E5150"
  • 7. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Unsupervised Framework Typically for a domain, a lot of knowledge articles, manuals, tutorials etc. are available in a variety of formats Most of these documents have less hyperlink and table (info-box as in Wikipedia) information Challenge is to learn a shallow ontology from raw unannotated plain text
  • 8. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Unsupervised Framework Typically for a domain, a lot of knowledge articles, manuals, tutorials etc. are available in a variety of formats Most of these documents have less hyperlink and table (info-box as in Wikipedia) information Challenge is to learn a shallow ontology from raw unannotated plain text
  • 9. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Unsupervised Framework Typically for a domain, a lot of knowledge articles, manuals, tutorials etc. are available in a variety of formats Most of these documents have less hyperlink and table (info-box as in Wikipedia) information Challenge is to learn a shallow ontology from raw unannotated plain text
  • 10. Domain Cartridge as a Graph install insert device handset android blackberry Operating system samsung sim card Samsung Galaxy victory Samsung array Domain term Domain process
  • 11. Domain Cartridge as a Graph install insert device handset android blackberry Operating system samsung sim card Samsung Galaxy victory Samsung array Domain term Domain process Synonyms
  • 12. Domain Cartridge as a Graph install insert device handset android blackberry Operating system samsung sim card Samsung Galaxy victory Samsung array Domain term Domain process Feature-Of
  • 13. Domain Cartridge as a Graph install insert device handset android blackberry Operating system samsung sim card Samsung Galaxy victory Samsung array Domain term Domain process Type-Of
  • 14. Domain Cartridge as a Graph install insert device handset android blackberry Operating system samsung sim card Samsung Galaxy victory Samsung array Domain term Domain process Action-On
  • 15. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Roadmap Unsupervised framework for shallow domain ontology construction: Domain Term Discovery (DTD) Improvement of Parser performance by DTD Domain Relation Discovery (DRD) Use-Case: Improvement of an in-house Question-Answering system Experiments: Manual Evaluation, Comparison with BabelNet, WordNet, Yago Conclusions
  • 16. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Roadmap Unsupervised framework for shallow domain ontology construction: Domain Term Discovery (DTD) Improvement of Parser performance by DTD Domain Relation Discovery (DRD) Use-Case: Improvement of an in-house Question-Answering system Experiments: Manual Evaluation, Comparison with BabelNet, WordNet, Yago Conclusions
  • 17. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Roadmap Unsupervised framework for shallow domain ontology construction: Domain Term Discovery (DTD) Improvement of Parser performance by DTD Domain Relation Discovery (DRD) Use-Case: Improvement of an in-house Question-Answering system Experiments: Manual Evaluation, Comparison with BabelNet, WordNet, Yago Conclusions
  • 18. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Roadmap Unsupervised framework for shallow domain ontology construction: Domain Term Discovery (DTD) Improvement of Parser performance by DTD Domain Relation Discovery (DRD) Use-Case: Improvement of an in-house Question-Answering system Experiments: Manual Evaluation, Comparison with BabelNet, WordNet, Yago Conclusions
  • 19. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Corpus: Knowledge articles, manuals, tutorials etc. Domain Cartridge: Framework
  • 20. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Parsing Domain Cartridge: Framework
  • 21. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Parsing English Slot Grammar (ESG) parser is used 50 - 100 times faster than the Charniak parser Shallow semantic relationship (SSR) annotation done over ESG parser output to generate normalized parser relations E.g., “Samsung has a battery" and “Samsung’s battery died" both generate the same relation ‘nnMod:samsung_battery’
  • 22. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Cartridge: Framework
  • 23. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Lucene Index For efcient retrieval of relations, documents, positional (offset) information, handling proximity based queries or answer scorers E.g: (Doc. Freq. of) All relations containing “install” or “rel:svo:quickbook_update_le” TokenStream modied to produce contiguous representation of all types of concepts (ngrams, relations, words etc.)
  • 24. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Cartridge: Framework
  • 25. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Term Discovery ESG parser maintains a domain term lexicon of multi-word concepts. E.g. “touch screen, sprint navigation” Noun Phrase Chunking done on document titles to extract frequently occuring concepts as domain words Enrich lexicon and bootstrap parser Parser generates rened output High precision but low recall — as titles are precise, clean but short To extract more ne-grained domain terms HITS is used on parser output
  • 26. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Term Discovery ESG parser maintains a domain term lexicon of multi-word concepts. E.g. “touch screen, sprint navigation” Noun Phrase Chunking done on document titles to extract frequently occuring concepts as domain words Enrich lexicon and bootstrap parser Parser generates rened output High precision but low recall — as titles are precise, clean but short To extract more ne-grained domain terms HITS is used on parser output
  • 27. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Term Discovery ESG parser maintains a domain term lexicon of multi-word concepts. E.g. “touch screen, sprint navigation” Noun Phrase Chunking done on document titles to extract frequently occuring concepts as domain words Enrich lexicon and bootstrap parser Parser generates rened output High precision but low recall — as titles are precise, clean but short To extract more ne-grained domain terms HITS is used on parser output
  • 28. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments HITS Any Shallow Semantic Relation (SSR) from ESG parser can be visualized as a hub generating domain terms Any domain term can be visualized as an authority influenced by incoming features from hubs Link strength between the hub and authority is taken as the number of mutual participating SSR in the Lucene Index Good authorities are incorporated in Parser Domain Term Lexicon Parser is re-run, rened relations generated and previous steps iterated till convergence
  • 29. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments HITS Any Shallow Semantic Relation (SSR) from ESG parser can be visualized as a hub generating domain terms Any domain term can be visualized as an authority influenced by incoming features from hubs Link strength between the hub and authority is taken as the number of mutual participating SSR in the Lucene Index Good authorities are incorporated in Parser Domain Term Lexicon Parser is re-run, rened relations generated and previous steps iterated till convergence
  • 30. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments HITS Any Shallow Semantic Relation (SSR) from ESG parser can be visualized as a hub generating domain terms Any domain term can be visualized as an authority influenced by incoming features from hubs Link strength between the hub and authority is taken as the number of mutual participating SSR in the Lucene Index Good authorities are incorporated in Parser Domain Term Lexicon Parser is re-run, rened relations generated and previous steps iterated till convergence
  • 31. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments HITS Any Shallow Semantic Relation (SSR) from ESG parser can be visualized as a hub generating domain terms Any domain term can be visualized as an authority influenced by incoming features from hubs Link strength between the hub and authority is taken as the number of mutual participating SSR in the Lucene Index Good authorities are incorporated in Parser Domain Term Lexicon Parser is re-run, rened relations generated and previous steps iterated till convergence
  • 32. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments HITS Any Shallow Semantic Relation (SSR) from ESG parser can be visualized as a hub generating domain terms Any domain term can be visualized as an authority influenced by incoming features from hubs Link strength between the hub and authority is taken as the number of mutual participating SSR in the Lucene Index Good authorities are incorporated in Parser Domain Term Lexicon Parser is re-run, rened relations generated and previous steps iterated till convergence
  • 33. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments
  • 34. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Feedback Domain Cartridge: Framework
  • 35. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Parser Performance Improvement Number of incomplete parses went down by 73% after incorporating domain terms in the parser lexicon
  • 36. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Terms samsung blackberry software-version application htc-evo wi-  memory-card bluetooth motorola kyocera browser voicemail microsoft-exchange lg-optimus samsung-m400 samsung-galaxy- victory software-updates samsung-array text-messaging wallpa- per synchronization iphone touch-screen blackberry-bold Table: Snapshot of multi-word domain terms extracted by NP Chunking. wi- optimus-g set-up novatel-wireless e-mail sierra-wireless apple-id google-maps play-music mobile-network 10-digit internet-explorer slacker-radio caller-id google-search address- book my-computer software-update blackberry-id as-well-as windows-update terms-of-service drop-down pro-700 add-on scp-2700 mac-os device-manager voice-mail non-camera Table: Snapshot of multi-word domain terms extracted by HITS (not found by NP Chunking).
  • 37. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Cartridge: Framework
  • 38. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Random Index (RI) Random Indexing is used for computing word-relatedness and dimensionality reduction RI considers “term X term” co-occurrence, as opposed to “term X document” matrix — allowing for incremental learning of context information, scaling up with the corpus size We adopt the notion of Relational Distributional Similarity. Two terms are considered similar if they appear in a similar context participating in similar Shallow Semantic Relations
  • 39. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Random Index (RI) Random Indexing is used for computing word-relatedness and dimensionality reduction RI considers “term X term” co-occurrence, as opposed to “term X document” matrix — allowing for incremental learning of context information, scaling up with the corpus size We adopt the notion of Relational Distributional Similarity. Two terms are considered similar if they appear in a similar context participating in similar Shallow Semantic Relations
  • 40. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Random Index Instead of taking the raw neighborhood of N terms, we use only those neighboring terms that share a syntactic dependency (given by ESG parser) with the target term Lucene Index is queried for term and relation co-occurrence statistics while constructing high dimensional random index vector for each term
  • 41. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Cartridge: Framework
  • 42. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Synonym Discovery Random Index is used to get top N similar terms for a given term based on Relational Distributional Similarity HITS is used to get the dominant domain terms and domain (SSR) relations Sim(wi, wj) = p Ili =lj ,ki =kj (fwki ,p, fwkj ,p ) p r Ili =lr ,ki =kr (fwki ,p, fwkr ,p ) Numerator counts the number of times any word appears in both feature vectors with similar dominant SSR relations Denominator counts the number of times it appears in the feature vector of any other word with that SSR relation
  • 43. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Synonym Discovery Random Index is used to get top N similar terms for a given term based on Relational Distributional Similarity HITS is used to get the dominant domain terms and domain (SSR) relations Sim(wi, wj) = p Ili =lj ,ki =kj (fwki ,p, fwkj ,p ) p r Ili =lr ,ki =kr (fwki ,p, fwkr ,p ) Numerator counts the number of times any word appears in both feature vectors with similar dominant SSR relations Denominator counts the number of times it appears in the feature vector of any other word with that SSR relation
  • 44. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Synonym Discovery High recall but low precision. E.g. ‘folder’ and ‘tab’ have a high relational distributional similarity due to overlapping context words like “open, close, minimize, shortcut, multiple" ESG parser attributes like noun, animated, physical object, device are used to increase precision
  • 45. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Cartridge: Framework
  • 46. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Action-On Discovery Action-On represents any activity or ‘method’ on a domain term E.g. “rel:svo:tap_add_account, rel:svo:phone_access_internet, rel:svo:mobile_sync_phone, rel:svo:account_use_phone etc." HITS index is traversed to extract all the dominant dm (action on entities), and verb-object from svo (subject-verb-object) SSR
  • 47. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Action-On Discovery Action-On represents any activity or ‘method’ on a domain term E.g. “rel:svo:tap_add_account, rel:svo:phone_access_internet, rel:svo:mobile_sync_phone, rel:svo:account_use_phone etc." HITS index is traversed to extract all the dominant dm (action on entities), and verb-object from svo (subject-verb-object) SSR
  • 48. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Type-Of Discovery Type-Of relations depict Is-A hierarchy i.e. a parent-child relation E.g. “rel:svo:devices_include_HTC, rel:npo:applications_ such-as_WhatsApp, rel:npo:features_like_call, rel:npo:contact_such-as_address". svo like “include” and npo “like, such-as, as” etc. are most informative Dominant domain terms from HITS, bearing Hearst patterns like “or”, “especially” E.g. service_or_process, prev_or_next, messages_especially_sms_and_ mms
  • 49. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Type-Of Discovery Type-Of relations depict Is-A hierarchy i.e. a parent-child relation E.g. “rel:svo:devices_include_HTC, rel:npo:applications_ such-as_WhatsApp, rel:npo:features_like_call, rel:npo:contact_such-as_address". svo like “include” and npo “like, such-as, as” etc. are most informative Dominant domain terms from HITS, bearing Hearst patterns like “or”, “especially” E.g. service_or_process, prev_or_next, messages_especially_sms_and_ mms
  • 50. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Feature-Of Discovery Feature-Of relations depict components or functionalities of a domain term nnMod depicting noun-noun modications, and subject-object (so) extracted from svo SSR used to discover Feature-Of from HITS index. E.g. “rel:nnMod:network_life, rel:nnMod:account_settings, rel:nnMod:iPhone_battery etc." “rel:svo:motorola-photon-4g_run_device- software, rel:svo:router_decrease_signal-strength, rel:svo:nd-a-store_open_locator etc."
  • 51. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Feature-Of Discovery Feature-Of relations depict components or functionalities of a domain term nnMod depicting noun-noun modications, and subject-object (so) extracted from svo SSR used to discover Feature-Of from HITS index. E.g. “rel:nnMod:network_life, rel:nnMod:account_settings, rel:nnMod:iPhone_battery etc." “rel:svo:motorola-photon-4g_run_device- software, rel:svo:router_decrease_signal-strength, rel:svo:nd-a-store_open_locator etc."
  • 52. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Smartphone Ontology Snapshot
  • 53. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Term Evaluation 5000 articles, tutorials and manuals from the smartphone domain We used the Back-of-the-Book Index (BOI) of manuals, to create ground truth for domain term discovery Baselines: WordNet (G. A. Miller. Wordnet: A lexical database for english. COMMUNICATIONS OF THE ACM, 38, 1995.) BabelNet (R. Navigli and S. P. Ponzetto. BabelNet: Building a very large multilingual semantic network. ACL ’10.) Yago (F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. WWW ’07.)
  • 54. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Term Evaluation Method Recall WordNet 22.62% NP Chunking on Titles 32.45% HITS 40.87% Yago 43.77% BabelNet 53.74% Table: Domain term evaluation.
  • 55. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Recall of a Question-Answering System Recall@N With Domain Term Lexicon Without domain term lexicon recall@1 0.40 0.33 recall@2 0.49 0.45 Table: Performance of a QA system with and without domain term lexicon. Incorporation of domain terms in parser lexicon improves QA system performance 1 D. Gondek et al. A framework for merging and ranking of answers in DeepQA. IBM Journal of Research and Development, 56(3), 2012.
  • 56. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Domain Relation Evaluation 2000 word pairs (500 for each of four categories) are manually annotated by two annotators High Kappa score of 0.92 for Synonyms and 0.70 for Type-Of due to the large number of word pairs, 84% and 62%, the annotators agree are not Synonyms and Type-Of respectively
  • 57. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Comparison Relations in BabelNet come either from (unlabeled) Wikipedia hyperlinks or WordNet Yago uses features like “wikipedia:redirect” links to detect synonymous contexts WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and “Meronyms, Holonyms” are subset of Feature-Of BabelNet, WordNet, Yago do not have any category like Action-On Yago has Type-Of categories derived from WordNet hyponymy and hypernymy, and Wikipedia categories System Type-Of Feature-Of Action-On BabelNet, WordNet 19.27% - - Yago 25.12% - - Domain Cartridge 77% 85.7% 68% Table: Recall comparison of systems for 3 relations.
  • 58. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Comparison Relations in BabelNet come either from (unlabeled) Wikipedia hyperlinks or WordNet Yago uses features like “wikipedia:redirect” links to detect synonymous contexts WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and “Meronyms, Holonyms” are subset of Feature-Of BabelNet, WordNet, Yago do not have any category like Action-On Yago has Type-Of categories derived from WordNet hyponymy and hypernymy, and Wikipedia categories System Type-Of Feature-Of Action-On BabelNet, WordNet 19.27% - - Yago 25.12% - - Domain Cartridge 77% 85.7% 68% Table: Recall comparison of systems for 3 relations.
  • 59. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Comparison Relations in BabelNet come either from (unlabeled) Wikipedia hyperlinks or WordNet Yago uses features like “wikipedia:redirect” links to detect synonymous contexts WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and “Meronyms, Holonyms” are subset of Feature-Of BabelNet, WordNet, Yago do not have any category like Action-On Yago has Type-Of categories derived from WordNet hyponymy and hypernymy, and Wikipedia categories System Type-Of Feature-Of Action-On BabelNet, WordNet 19.27% - - Yago 25.12% - - Domain Cartridge 77% 85.7% 68% Table: Recall comparison of systems for 3 relations.
  • 60. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Comparison Relations in BabelNet come either from (unlabeled) Wikipedia hyperlinks or WordNet Yago uses features like “wikipedia:redirect” links to detect synonymous contexts WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and “Meronyms, Holonyms” are subset of Feature-Of BabelNet, WordNet, Yago do not have any category like Action-On Yago has Type-Of categories derived from WordNet hyponymy and hypernymy, and Wikipedia categories System Type-Of Feature-Of Action-On BabelNet, WordNet 19.27% - - Yago 25.12% - - Domain Cartridge 77% 85.7% 68% Table: Recall comparison of systems for 3 relations.
  • 61. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Comparison Relations in BabelNet come either from (unlabeled) Wikipedia hyperlinks or WordNet Yago uses features like “wikipedia:redirect” links to detect synonymous contexts WordNet “Hyponyms, “Hypernyms” are similar to Type-Of, and “Meronyms, Holonyms” are subset of Feature-Of BabelNet, WordNet, Yago do not have any category like Action-On Yago has Type-Of categories derived from WordNet hyponymy and hypernymy, and Wikipedia categories System Type-Of Feature-Of Action-On BabelNet, WordNet 19.27% - - Yago 25.12% - - Domain Cartridge 77% 85.7% 68% Table: Recall comparison of systems for 3 relations.
  • 62. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Distributional Similarity Comparison System Precision Recall F-Score Yago 38% 32% 34.37% BabelNet, WordNet 83% 31% 45.14% Domain Cartridge (DC) 58% 41% 47.60% DC + WordNet 62% 40% 49.00% DC + ESG Parser Features 65% 39% 49.14% Table: Precision-Recall comparison of Domain Cartridge (random-indexing, HITS and sim. eqn.) with other systems.
  • 63. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Comparison with Distributional Similarity Measures in WordNet WordNet F-Score LCH 0.22 RES 0.31 JCN 0.42 PATH 0.42 LIN 0.43 WUP 0.43 LESK 0.45 Domain Cartridge 0.49 Table: F-Score comparison of WordNet similarity measures with Domain Cartridge.
  • 64. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Conclusions Unsupervised framework for shallow domain ontology construction, without using manually annotated resources Multi-words form an important component of Domain Term Discovery Incorporation of domain terms in parser lexicon results in 73% reduction in incomplete parses, improving performance of an in-house QA system by upto 7% Synonym discovery approach, using Relational Distributional Similarity, RI, HITS etc., performs better than other existing approaches Future Work: Measure customization time required for domain adaptation of the QA system
  • 65. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Conclusions Unsupervised framework for shallow domain ontology construction, without using manually annotated resources Multi-words form an important component of Domain Term Discovery Incorporation of domain terms in parser lexicon results in 73% reduction in incomplete parses, improving performance of an in-house QA system by upto 7% Synonym discovery approach, using Relational Distributional Similarity, RI, HITS etc., performs better than other existing approaches Future Work: Measure customization time required for domain adaptation of the QA system
  • 66. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Conclusions Unsupervised framework for shallow domain ontology construction, without using manually annotated resources Multi-words form an important component of Domain Term Discovery Incorporation of domain terms in parser lexicon results in 73% reduction in incomplete parses, improving performance of an in-house QA system by upto 7% Synonym discovery approach, using Relational Distributional Similarity, RI, HITS etc., performs better than other existing approaches Future Work: Measure customization time required for domain adaptation of the QA system
  • 67. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Conclusions Unsupervised framework for shallow domain ontology construction, without using manually annotated resources Multi-words form an important component of Domain Term Discovery Incorporation of domain terms in parser lexicon results in 73% reduction in incomplete parses, improving performance of an in-house QA system by upto 7% Synonym discovery approach, using Relational Distributional Similarity, RI, HITS etc., performs better than other existing approaches Future Work: Measure customization time required for domain adaptation of the QA system
  • 68. Domain Adaptation for IE and IR Domain Term Discovery Domain Relation Discovery Experiments Conclusions Unsupervised framework for shallow domain ontology construction, without using manually annotated resources Multi-words form an important component of Domain Term Discovery Incorporation of domain terms in parser lexicon results in 73% reduction in incomplete parses, improving performance of an in-house QA system by upto 7% Synonym discovery approach, using Relational Distributional Similarity, RI, HITS etc., performs better than other existing approaches Future Work: Measure customization time required for domain adaptation of the QA system