SlideShare a Scribd company logo
1 of 31
Download to read offline
Compact Hierarchical Explicit
Semantic Representation
(CHESA)
Sonya Liberman and Shaul Markovitch
In Proceedings of the IJCAI 2009 Workshop on
User-Contributed Knowledge and Artificial
Intelligence: An Evolving Synergy (WikiAI09),
Pasadena, CA, 2009
2
Motivation
Clustering
Categorization Filtering
Search
Many tasks are related to text manipulation and
processing
3
Motivation
Humans perform these tasks much better than
machines
Headache
4
Motivation
Headache
Humans easily recognize these semantic relations
5
Can We Endow Computers with Such
Capabilities?
n  Generate a semantic representation
n  Decide whether the two representations are related
Headache Neurology
Semantic
Representation
of Headache
Semantic
Representation
of Neurology
Table
Semantic
Representation
of Table
RelatedUnrelated
n  Language is the main communication medium between
people
n  People use common world knowledge to communicate
n  Humans often organize knowledge within hierarchical
ontologies
Semantic Representation
Representation of semantics should be based on
n  Natural human-defined concepts (world knowledge)
n  Inner organization of these concepts as perceived by humans
Semantic Representation
Across the
Universe
Yesterday Hey Jude
Hit Me
Baby One
More Time
Crazy
Songs Songs
8
Our Approach
Semantics is represented as a compact hierarchical
structure of pre-defined natural concepts
Headache
Semantic Representation - Requirements
n  Automatic construction
n  Maximal lingual coverage
n  Comprehensibility
n  Representation granularity
q  Representation at varying abstraction levels
q  Flexibility in representation size
n  Distance metric
Compact Hierarchical Explicit
Semantic Representation
11
Hierarchical Representation of Semantics
n  Assume an pre-defined global hierarchical ontology
n  Assume each node is associated with textual content
12
Hierarchical Representation of Semantics
Word semantics is represented as a weighted sub-
hierarchy within the ontology
word
13
Wikipedia as a Hierarchical Ontology of
Natural Concepts
n  Almost 3 million English articles
n  A hierarchical inner structure of categories
14
Wikipedia as a Hierarchical Ontology of
Natural Concepts
Wikipedia
Categories
Wikipedia
Articles
Article
Content
Category content is the
content of its sub-tree
15
Automatic Construction of Semantic
Representation
Which concepts should be included?
16
The Conditional Overrepresentation
Criterion - Intuition
A child concept is
added if the word is
significantly more
associated with it
than with the parent
concept
Gene
17
The Conditional Overrepresentation
Criterion φw
N
M k
Parent Concept Child Concept
φw = 1 - Pr(X ≥ k)
High φw
Low probability that k
occurrences were
observed by chance
n
X ~ HG(N, M, n)
Performing a Hypergeometric test
18
Compact Hierarchical Explicit Semantic
Representation (CHESA)
Benchmark
Greedy Top-Down CHESA Algorithm
1.  Represent semantics with the root concept only
2.  Traverse conceptual hierarchy top-down
3.  Each iteration add the concept with maximal φw
19
Compact Hierarchical Explicit Semantic
Representation (CHESA)
Greedy Top-Down CHESA Algorithm
4.  Terminate when reaching size k or threshold for φw
Benchmark
For k = 15
The greedy bottom-up algorithm
(Bottom-Up CHESA) prunes
concepts according to φw
20
Assigning Association Scores to Concepts
Benchmark
The association score for the word w and a concept c is
0.56
2.49
9.27
10.03 8.34
1.56
0.77
1.42
2.712.35
4.35 3.95
0.10
1.01
Evaluating CHESA
22
Using CHESA for Semantic Relatedness
Words are related when
q  Their representations intersect
q  Intersecting concepts have high association scores
Neurology
K = 30
Headache
K = 30
Biology
Neurological
disorders
Biology
Neurological
disorders
Cognition
Medical
treatment
23
Empirical Evaluation
n  Testing on WordSimilarity353
q  353 word pairs judged by humans for semantic relatedness
n  Measuring correlation with human judgments
q  With varying values of representation size k
q  With an unlimited representation size
n  Comparing results to Explicit Semantic Analysis (ESA)
q  E. Gabrilovitch and S. Markovitch 2005, 2006, 2007
24
The semantics of a word is a vector of its associations
with Wikipedia articles
Semantic relatedness is measured by the cosine similarity between
the two vectors
Explicit Semantic Analysis (ESA)
Gabrilovich and Markovitch (2005,2006,2007)
Benchmark
25
ESA Based Semantic Relatedness
ESA
Top 20 Concepts
Cat (Unix)
Cheshire Cat
Cool Cat
Plasan Sand Cat
Claude Cat
Big cat
Stray Cats
Felidae
Cat's Eye (film)
Cat scratch fever
Saber-toothed cat
New Britain Rock Cats
Cats (musical)
Cats & Dogs
Clan Nova Cat
Cat on a Hot Tin Roof
Sacramento River Cats
Wildcat
Jungle Cat
Leopard Cat
No intersecting
concepts
Cosine similarity
is zero
Cheshire Cat
Stray Cats
Sacramento River Cats
ESA
Top 20 Concepts
Mouse
Modest Mouse
Stanley Mouse
Mickey's Magical Christmas
Danger Mouse
Disney's House of Mouse
Apple Mighty Mouse
Natal Multimammate Mouse
Harvest Mouse
Wood mouse
Chevrotain
Mouse (computing)
Wild Mouse roller coaster
Josephine the Singer, or the Mouse Folk
Mighty Mouse
Mouse on Mars
The Mickey Mouse Club
Mickey Mouse
Minnie Mouse
Mickey Mouse Clubhouse
26
CHESA Based Semantic Relatedness
Zoology
Zoology
Entertainment
Entertainment
Natural
sciences
Natural
sciences
Top Down-CHESA k = 20
Top Down-CHESA k = 20
Cell Biology
Domestication
Results
28
Empirical Evaluation - Results
Evaluation with varying values of representation size
29
Empirical Evaluation - Results
Evaluation when resources are unlimited
Algorithm
WordNet
LSA
WikiRelate!
MarkovLink
ESA
CHESA
Correlation
0.35
0.56
0.50
0.55
0.74
0.72
q Using ESA full interpretation vectors
q Using CHESA full hierarchical representation
30
Conclusions
n  CHESA: a novel methodology for compact hierarchical
representation of semantics
n  A flexible algorithm that constructs semantic representations
at any given size
n  Significantly improves semantic relatedness results when
resources are limited
q  Captures semantics when representation size is limited by
performing generalizations
q  Using a conditional overrepresentation criterion to create a
compact and comprehensible representation
31
Thank You

More Related Content

Viewers also liked

Uvod u morfosintaksu, lecture 03, 12 13
Uvod u morfosintaksu, lecture 03, 12 13Uvod u morfosintaksu, lecture 03, 12 13
Uvod u morfosintaksu, lecture 03, 12 13
Alen Šogolj
 
Understanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sUnderstanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’s
anbray723
 
The structure of the english and arabic sentences
The structure of the english and arabic sentencesThe structure of the english and arabic sentences
The structure of the english and arabic sentences
maryam6666
 

Viewers also liked (20)

Uvod u morfosintaksu, lecture 03, 12 13
Uvod u morfosintaksu, lecture 03, 12 13Uvod u morfosintaksu, lecture 03, 12 13
Uvod u morfosintaksu, lecture 03, 12 13
 
Pre modefication of np
Pre modefication of npPre modefication of np
Pre modefication of np
 
Np pre modification
Np pre modificationNp pre modification
Np pre modification
 
Syntax 334 Noun phrases
Syntax 334 Noun phrasesSyntax 334 Noun phrases
Syntax 334 Noun phrases
 
Presentation7
Presentation7Presentation7
Presentation7
 
Adgectives in English and Arbic
Adgectives in English and ArbicAdgectives in English and Arbic
Adgectives in English and Arbic
 
noun phrase modifier
noun phrase modifiernoun phrase modifier
noun phrase modifier
 
Order of Adjectives
Order of AdjectivesOrder of Adjectives
Order of Adjectives
 
Nouns phrase ppt1
Nouns phrase ppt1Nouns phrase ppt1
Nouns phrase ppt1
 
Understanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sUnderstanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’s
 
Adjective phrase 2
Adjective phrase 2Adjective phrase 2
Adjective phrase 2
 
المبتدأ و الخبر
المبتدأ و الخبرالمبتدأ و الخبر
المبتدأ و الخبر
 
Order of adjectives
Order of adjectivesOrder of adjectives
Order of adjectives
 
Arabic 1: basics on nouns
Arabic 1: basics on nouns Arabic 1: basics on nouns
Arabic 1: basics on nouns
 
Adjectives in-english-and-arabic
Adjectives in-english-and-arabicAdjectives in-english-and-arabic
Adjectives in-english-and-arabic
 
Types of adjectives
Types of adjectivesTypes of adjectives
Types of adjectives
 
The structure of the english and arabic sentences
The structure of the english and arabic sentencesThe structure of the english and arabic sentences
The structure of the english and arabic sentences
 
Teaching Arabic Speakers: Linguistic and Cultural Considerations, Shira Packer
Teaching Arabic Speakers: Linguistic and Cultural Considerations, Shira PackerTeaching Arabic Speakers: Linguistic and Cultural Considerations, Shira Packer
Teaching Arabic Speakers: Linguistic and Cultural Considerations, Shira Packer
 
Adjective phrases
Adjective phrasesAdjective phrases
Adjective phrases
 
Syntax 334 lecture 3
Syntax 334 lecture 3Syntax 334 lecture 3
Syntax 334 lecture 3
 

Similar to Compact Hierarchical Explicit Semantic Representation

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
butest
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
Dimitrios Kartsaklis
 

Similar to Compact Hierarchical Explicit Semantic Representation (18)

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
Functional and Structural Models of Commonsense Reasoning in Cognitive Archit...
Functional and Structural Models of Commonsense Reasoning in Cognitive Archit...Functional and Structural Models of Commonsense Reasoning in Cognitive Archit...
Functional and Structural Models of Commonsense Reasoning in Cognitive Archit...
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevance
 
Knowledge Extraction
Knowledge ExtractionKnowledge Extraction
Knowledge Extraction
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Artifial intelligence
Artifial intelligenceArtifial intelligence
Artifial intelligence
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 
ULM1 - The borders of Ambiguity
ULM1 - The borders of AmbiguityULM1 - The borders of Ambiguity
ULM1 - The borders of Ambiguity
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Concept based short text classification and ranking
Concept based short text classification and rankingConcept based short text classification and ranking
Concept based short text classification and ranking
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 

More from Sonya Liberman

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 

More from Sonya Liberman (7)

Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models Optimization
 
From Spark to Elasticsearch and Back - Learning Large Scale Models for Conten...
From Spark to Elasticsearch and Back - Learning Large Scale Models for Conten...From Spark to Elasticsearch and Back - Learning Large Scale Models for Conten...
From Spark to Elasticsearch and Back - Learning Large Scale Models for Conten...
 
Looking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended VersionLooking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended Version
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Compact Hierarchical Explicit Semantic Representation

  • 1. Compact Hierarchical Explicit Semantic Representation (CHESA) Sonya Liberman and Shaul Markovitch In Proceedings of the IJCAI 2009 Workshop on User-Contributed Knowledge and Artificial Intelligence: An Evolving Synergy (WikiAI09), Pasadena, CA, 2009
  • 2. 2 Motivation Clustering Categorization Filtering Search Many tasks are related to text manipulation and processing
  • 3. 3 Motivation Humans perform these tasks much better than machines Headache
  • 5. 5 Can We Endow Computers with Such Capabilities? n  Generate a semantic representation n  Decide whether the two representations are related Headache Neurology Semantic Representation of Headache Semantic Representation of Neurology Table Semantic Representation of Table RelatedUnrelated
  • 6. n  Language is the main communication medium between people n  People use common world knowledge to communicate n  Humans often organize knowledge within hierarchical ontologies Semantic Representation
  • 7. Representation of semantics should be based on n  Natural human-defined concepts (world knowledge) n  Inner organization of these concepts as perceived by humans Semantic Representation Across the Universe Yesterday Hey Jude Hit Me Baby One More Time Crazy Songs Songs
  • 8. 8 Our Approach Semantics is represented as a compact hierarchical structure of pre-defined natural concepts Headache
  • 9. Semantic Representation - Requirements n  Automatic construction n  Maximal lingual coverage n  Comprehensibility n  Representation granularity q  Representation at varying abstraction levels q  Flexibility in representation size n  Distance metric
  • 11. 11 Hierarchical Representation of Semantics n  Assume an pre-defined global hierarchical ontology n  Assume each node is associated with textual content
  • 12. 12 Hierarchical Representation of Semantics Word semantics is represented as a weighted sub- hierarchy within the ontology word
  • 13. 13 Wikipedia as a Hierarchical Ontology of Natural Concepts n  Almost 3 million English articles n  A hierarchical inner structure of categories
  • 14. 14 Wikipedia as a Hierarchical Ontology of Natural Concepts Wikipedia Categories Wikipedia Articles Article Content Category content is the content of its sub-tree
  • 15. 15 Automatic Construction of Semantic Representation Which concepts should be included?
  • 16. 16 The Conditional Overrepresentation Criterion - Intuition A child concept is added if the word is significantly more associated with it than with the parent concept Gene
  • 17. 17 The Conditional Overrepresentation Criterion φw N M k Parent Concept Child Concept φw = 1 - Pr(X ≥ k) High φw Low probability that k occurrences were observed by chance n X ~ HG(N, M, n) Performing a Hypergeometric test
  • 18. 18 Compact Hierarchical Explicit Semantic Representation (CHESA) Benchmark Greedy Top-Down CHESA Algorithm 1.  Represent semantics with the root concept only 2.  Traverse conceptual hierarchy top-down 3.  Each iteration add the concept with maximal φw
  • 19. 19 Compact Hierarchical Explicit Semantic Representation (CHESA) Greedy Top-Down CHESA Algorithm 4.  Terminate when reaching size k or threshold for φw Benchmark For k = 15 The greedy bottom-up algorithm (Bottom-Up CHESA) prunes concepts according to φw
  • 20. 20 Assigning Association Scores to Concepts Benchmark The association score for the word w and a concept c is 0.56 2.49 9.27 10.03 8.34 1.56 0.77 1.42 2.712.35 4.35 3.95 0.10 1.01
  • 22. 22 Using CHESA for Semantic Relatedness Words are related when q  Their representations intersect q  Intersecting concepts have high association scores Neurology K = 30 Headache K = 30 Biology Neurological disorders Biology Neurological disorders Cognition Medical treatment
  • 23. 23 Empirical Evaluation n  Testing on WordSimilarity353 q  353 word pairs judged by humans for semantic relatedness n  Measuring correlation with human judgments q  With varying values of representation size k q  With an unlimited representation size n  Comparing results to Explicit Semantic Analysis (ESA) q  E. Gabrilovitch and S. Markovitch 2005, 2006, 2007
  • 24. 24 The semantics of a word is a vector of its associations with Wikipedia articles Semantic relatedness is measured by the cosine similarity between the two vectors Explicit Semantic Analysis (ESA) Gabrilovich and Markovitch (2005,2006,2007) Benchmark
  • 25. 25 ESA Based Semantic Relatedness ESA Top 20 Concepts Cat (Unix) Cheshire Cat Cool Cat Plasan Sand Cat Claude Cat Big cat Stray Cats Felidae Cat's Eye (film) Cat scratch fever Saber-toothed cat New Britain Rock Cats Cats (musical) Cats & Dogs Clan Nova Cat Cat on a Hot Tin Roof Sacramento River Cats Wildcat Jungle Cat Leopard Cat No intersecting concepts Cosine similarity is zero Cheshire Cat Stray Cats Sacramento River Cats ESA Top 20 Concepts Mouse Modest Mouse Stanley Mouse Mickey's Magical Christmas Danger Mouse Disney's House of Mouse Apple Mighty Mouse Natal Multimammate Mouse Harvest Mouse Wood mouse Chevrotain Mouse (computing) Wild Mouse roller coaster Josephine the Singer, or the Mouse Folk Mighty Mouse Mouse on Mars The Mickey Mouse Club Mickey Mouse Minnie Mouse Mickey Mouse Clubhouse
  • 26. 26 CHESA Based Semantic Relatedness Zoology Zoology Entertainment Entertainment Natural sciences Natural sciences Top Down-CHESA k = 20 Top Down-CHESA k = 20 Cell Biology Domestication
  • 28. 28 Empirical Evaluation - Results Evaluation with varying values of representation size
  • 29. 29 Empirical Evaluation - Results Evaluation when resources are unlimited Algorithm WordNet LSA WikiRelate! MarkovLink ESA CHESA Correlation 0.35 0.56 0.50 0.55 0.74 0.72 q Using ESA full interpretation vectors q Using CHESA full hierarchical representation
  • 30. 30 Conclusions n  CHESA: a novel methodology for compact hierarchical representation of semantics n  A flexible algorithm that constructs semantic representations at any given size n  Significantly improves semantic relatedness results when resources are limited q  Captures semantics when representation size is limited by performing generalizations q  Using a conditional overrepresentation criterion to create a compact and comprehensible representation