SlideShare a Scribd company logo
1 of 31
DATA/MSML 641
scienceacademy.umd.edu
DATA 641
Word meaning
Lecture by Philip Resnik
03/22/2021
scienceacademy.umd.edu
DATA/MSML 641
• Assignment 1 was due a couple minutes ago!
• Last lecture, we talked about what constitutes a word, MWE/collocations,
and statistical hypothesis testing
• We’ll take the first 20 minutes of this lecture to go over Assignment 0
Before we begin
DATA/MSML 641
• Word - See prior lecture for the hidden complexity!
• Meaning - What do we mean by meaning?
Word Meaning
DATA/MSML 641
scienceacademy.umd.edu
Lexicography
scienceacademy.umd.edu
DATA/MSML 641
• Looking up definitions for a word in a dictionary is the lexicographic
tradition
• Dictionaries enumerate word meanings (senses)
• What are the pitfalls of this approach?
• Definitions are not “defining” in the mathematical sense
• but also not in a precise legal sense
• “Assault” - Intentional, affects another person with reasonable apprehension,
harmful or offensive contact, imminently, physical injury NOT required
• “Definitions” are text you read that evoke concepts in your head
Lexicographic Traditions
DATA/MSML 641
• Homonymy
• “Pen”
• “Place to keep animals”
• “Instrument to write with”
• Coincidental or historical?
• Polysemy
• “Bass”
• “Low register”
• “Person who sings low”
• Related meanings
• Systematic / Structured polysemy
• “Chicken”
• Type of animal
• Food
Types of Word Senses
DATA/MSML 641
• Labor-intensive tradition
• Ex. Oxford English Dictionary (O.E.D.), started 1857
• Crowdsourcing by volunteers to find quotations illustrating each word
took more than 50 years to complete!
• c.f. Simon Winchester, The Professor and the Madman
• Corpus analysis approach
• Ex. Collins COBUILD dictionary, published 1987
• KWIC: Keyword in context
… sing a bass solo after he …
… lovely bass voice
reach the bass notes …
played bass guitar and …
with a bass and a lead guitarist ...
Lexicography - making dictionaries
DATA/MSML 641
• Necessary and sufficient conditions need to be met for it to be
defined something
• Ex. One is a “bachelor” iff it is a {human, male, adult, never married}
(can be thought of as features for a model)
• Lexical-conceptual structures (LCS)
• Framework for semantic analysis, developed by Ray Jackendoff in
the 70s
Decompositional representations
CAUSE
X GO
TO
Y
AT Z
Primetime semantic elements
DATA/MSML 641
scienceacademy.umd.edu
Ontological Approaches
scienceacademy.umd.edu
DATA/MSML 641
• Set of concepts and relations
between them
• There are also gene, business
(eg NAICS), astronomy
ontologies, and more!
Ontologies
Ex.
(Keychain) Animalia
(Phzlaw) Chordate - dogs, lions, lizards, fish, . . .
(Chores) Mammalia - dogs, lions, elephants
(Order) Carnivores - dogs, lions, . . .
(Family) Canedae - coyotes, dogs, jackets, wolves
(Gems) Canis - dogs, foxes, jackals
(Species) Canis lupus - dogs, wolves
(Sub species) Canis lupus familiarizes - domestic dogs
Beagle
Pocket beagle
Remy
INSTANCE--OF
- Gene ontologies
- Boseneia ontologies
(eg. NAICS)
- Astsonomug ontologies
IS-A
DATA/MSML 641
• Conceived by George Miller as an ontology for English
• Has evolved into a broader collection of multilingual
WordNets
• Distinct ontologies for N, V, Adj, Adv
• Core concept: SynonymNet ~ “Concept”
• <board, plank, beam>
• <board, committee, panel, council>
• The noun taxonomy is most widely used
• Hyper/hyponym (IS-A)
• Instance (INSTANCE-OF)
• Meronym (PART-OF)
• Antonym (OPPOSITE)
WordNet
DATA/MSML 641
• Related to the core idea in AI of “inheritance”, also a
fundamental role in early semantic networks
• C2 IS-A C1 => for all f, f is a property of C1 => f is also a property of C2
• Related to subclassing in object-oriented programming
WordNet
DATA/MSML 641
scienceacademy.umd.edu
Word Sense Disambiguation (WSD)
Task
scienceacademy.umd.edu
DATA/MSML 641
• A classic problem in NLP
• Bar-Hillel (1060) argued that fully-automatic high-quality machine
translation (FAHQMT) was infeasible because it would require too
much world knowledge
• “The box is in the pen” requires commonsense reasoning about
relative sizes to disambiguate and translate “pen” correctly
• Long era of work on small sets of individual words (e.g. line, bank)
• Resnik and Yarowsky (1997) created SENSEVAL (later SemEval)
• Community-wise shared test for WSD
WSD
DATA/MSML 641
• Given:
• enumerated senses {s1, s2, …, sn} for a word w
• context for w (…. w ….)
• Select:
• “correct” si for w
• Strong baseline: Just the most frequent sense!
• Good pre-deep-learning baseline: Naive Bayes
• Choose argmax_y Pr(y|x) = argmax_y Pr(x|y)
WSD as supervised classification
DATA/MSML 641
• Strong baseline: Just the most frequent sense!
• Good pre-deep-learning baseline: Naive Bayes
• Choose
• Naively assume all features are independent
WSD as supervised classification
Multiply independent
probabilities
Prior (often uniform)
DATA/MSML 641
• Other supervised approaches:
• SVM with engineered features (See J&M 18.5.1)
WSD as supervised classification
DT JJ NN CC NN NB
An electric guitar and bass player stand . . .
Wi-2 Wi-1 Wi Wi+1 Wi+2
Extract feature vector
DATA/MSML 641
• Given
• … sawed a board in half and …
• … nailed together two pine boards …
• Create contextual embedding vectors for each sense, and use 1-N-
N classifier!
Supervised state-of-the-art (as of J+M 18.4.2)
…. board ….
Minimize cosine distance
to Aeleoy BOARDi
VBOARD1
VBOARD2
VBOARDN
Non-engineered
contractor
concentration
for sense C
c1
c2
cn
DATA/MSML 641
scienceacademy.umd.edu
“I don’t believe in Word Senses”
Kilgariff (2003)
scienceacademy.umd.edu
DATA/MSML 641
• Kilgariff advocated for task-dependent clustering of corpus
instances
• Schütze: Early distributional representations can form clusters
• Current trend is to use distributional representations as meaning
• “You shall know a word by the company it keeps”
• However, consider: systematic regularities (e.g. lamb is a food
or animal?), sparse data (esp for specialized domains),
explainability (supporting understanding and trusting inferences)
Abandoning enumerated senses
DATA/MSML 641
• Low performance - hard problem…
• Skew of senses
• Most-frequent sense is largely represented by the word itself
(Zipf’s law)
• “Pen” usually means writing implement for example
• One sense per discourse
• Implicit disambiguation (bank -> ambiguous, bank … atm … hours
-> unambiguous)
Why doesn’t traditional WSD help a lot of
standard applications?
DATA/MSML 641
scienceacademy.umd.edu
Distributional Approaches to Meaning
scienceacademy.umd.edu
DATA/MSML 641
• Symbolic representations are basically 1-hot encodings
Multiply 1xn matrix by nx1 matrix to get a 1x1 result
Distributional approaches to meaning
Or geometrically, measure
cosine similarity
DATA/MSML 641
• Often measured as
Problem: sim(x,y) = 0 for x!=y if 1-hot! Has trouble generalizing {puppy, dog, pup}
Vector Similarity
A
Wheel
Rome
Scooter
Books whert
Harly
Stay health
Z itk
Books whert food
• apple
• orange
DATA/MSML 641
• Often measured as
Problem: sim(x,y) = 0 for x!=y if 1-hot! Has trouble generalizing {puppy, dog, pup}
Vector Similarity
DATA/MSML 641
• Problem: sim(x,y) = 0 for x!=y if 1-hot! Has trouble generalizing {puppy, dog, pup}
• Solution: Use weights
Vector Similarity
Vwi dj = + f i,j . i dfi
Freq.
of wi in dj
inversely proportional to
# of dj’s containing wi
Ex. If “the” is in qqq0 of dogs,
As idf is really low
Usually: log (N/ # dccs containing wi)
If wi ∈ all doc,
idf = 0!
Usually:
DATA/MSML 641
• Using contexts to define word vectors
• count(wi, wj) = # times wi, wj co-occur in a window
• Same issue with frequent words, PMI
Vector Similarity
W1 Wv
W1
Wv
term - term
matrix
DATA/MSML 641
• Each word has a very high dimension! This can lead to reduced ability to generalize
• What are some dimension reduction techniques we can leverage?
• Latent Semantic Indexing (LSI), use Singular Value Decomposition to reduce
words/documents
Dimensionality reduction
M = w
m x n
c – dices, words
Original matric
Partially neglected
=
U ∑ V+
m x m m x n n x n
x x
0
0
1
0
Dugual
t
Truncated SVD to get
DATA/MSML 641
• CBOW - simple! Predict target word from context
• Skip-gram - a bit more involved, predict context words from target (reverse CBOW)
word2vec
Wi-2 O
Wi-1 O
Wi+1 O
Wi+2 O
O Wi-2
DATA/MSML 641
word2vec: skip-grams
● Goal: predict context words for a given
target word
● Optimize feature vectors:
○ Intuition: we want context words
with target to be high, and
random words with target to be
low
Wi
Feature vector
Mutuliyal continuously
O
O
O
O
O
Wi-2 (=c1) Wi , Wi-2 +
Wi-1 (=c2) Wi , Wi-1 +
Wi+1 (=c3) Wi , Wi+1 +
Wi+2 (=c4) Wi , Wi+2 +
Wi , -
Wi , -
Wi , -
Wi , -
L context words,
treated as
independent
Random samples
to get negative example
DATA/MSML 641
Effect of word2vec
● Words that appear in similar contexts get closer-together feature vectors
● Smaller contexts
○ More syntactic since near-words are more likely to be syntactically
related
● Captures structure in semantic spaces (classic man -> king as woman ->
queen example)
● Can extend to document-level work (e.g. doc2vec!)
king
queen
woman
man

More Related Content

Similar to DATA641 Lecture 3 - Word meaning.pptx

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Patrick Hanks - Why lexicographers should take more notice of phraseology, co...
Patrick Hanks - Why lexicographers should take more notice of phraseology, co...Patrick Hanks - Why lexicographers should take more notice of phraseology, co...
Patrick Hanks - Why lexicographers should take more notice of phraseology, co...Scottish Language Dictionaries
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Sebastian Ruder
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectorsOsebe Sammi
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesRafael Alvarado
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 documentUma Kant
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationMaryOsborne11
 

Similar to DATA641 Lecture 3 - Word meaning.pptx (20)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Patrick Hanks - Why lexicographers should take more notice of phraseology, co...
Patrick Hanks - Why lexicographers should take more notice of phraseology, co...Patrick Hanks - Why lexicographers should take more notice of phraseology, co...
Patrick Hanks - Why lexicographers should take more notice of phraseology, co...
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Lec 3.pdf
Lec 3.pdfLec 3.pdf
Lec 3.pdf
 
Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...
Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...
Mind, Brain, and Education: How Cognitive & Neuro Science Inform Educational ...
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense Disambiguation
 

Recently uploaded

Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewasmakika9823
 
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfCatalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfOrient Homes
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckHajeJanKamps
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756dollysharma2066
 
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCRsoniya singh
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...lizamodels9
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
Islamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in IslamabadIslamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in IslamabadAyesha Khan
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...lizamodels9
 
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | DelhiFULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | DelhiMalviyaNagarCallGirl
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...lizamodels9
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 

Recently uploaded (20)

Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
 
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfCatalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
 
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deckPitch Deck Teardown: NOQX's $200k Pre-seed deck
Pitch Deck Teardown: NOQX's $200k Pre-seed deck
 
KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
Call Girls In ⇛⇛Chhatarpur⇚⇚. Brings Offer Delhi Contact Us 8377877756
 
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
Islamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in IslamabadIslamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in Islamabad
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
 
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | DelhiFULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 

DATA641 Lecture 3 - Word meaning.pptx

  • 1. DATA/MSML 641 scienceacademy.umd.edu DATA 641 Word meaning Lecture by Philip Resnik 03/22/2021 scienceacademy.umd.edu
  • 2. DATA/MSML 641 • Assignment 1 was due a couple minutes ago! • Last lecture, we talked about what constitutes a word, MWE/collocations, and statistical hypothesis testing • We’ll take the first 20 minutes of this lecture to go over Assignment 0 Before we begin
  • 3. DATA/MSML 641 • Word - See prior lecture for the hidden complexity! • Meaning - What do we mean by meaning? Word Meaning
  • 5. DATA/MSML 641 • Looking up definitions for a word in a dictionary is the lexicographic tradition • Dictionaries enumerate word meanings (senses) • What are the pitfalls of this approach? • Definitions are not “defining” in the mathematical sense • but also not in a precise legal sense • “Assault” - Intentional, affects another person with reasonable apprehension, harmful or offensive contact, imminently, physical injury NOT required • “Definitions” are text you read that evoke concepts in your head Lexicographic Traditions
  • 6. DATA/MSML 641 • Homonymy • “Pen” • “Place to keep animals” • “Instrument to write with” • Coincidental or historical? • Polysemy • “Bass” • “Low register” • “Person who sings low” • Related meanings • Systematic / Structured polysemy • “Chicken” • Type of animal • Food Types of Word Senses
  • 7. DATA/MSML 641 • Labor-intensive tradition • Ex. Oxford English Dictionary (O.E.D.), started 1857 • Crowdsourcing by volunteers to find quotations illustrating each word took more than 50 years to complete! • c.f. Simon Winchester, The Professor and the Madman • Corpus analysis approach • Ex. Collins COBUILD dictionary, published 1987 • KWIC: Keyword in context … sing a bass solo after he … … lovely bass voice reach the bass notes … played bass guitar and … with a bass and a lead guitarist ... Lexicography - making dictionaries
  • 8. DATA/MSML 641 • Necessary and sufficient conditions need to be met for it to be defined something • Ex. One is a “bachelor” iff it is a {human, male, adult, never married} (can be thought of as features for a model) • Lexical-conceptual structures (LCS) • Framework for semantic analysis, developed by Ray Jackendoff in the 70s Decompositional representations CAUSE X GO TO Y AT Z Primetime semantic elements
  • 10. DATA/MSML 641 • Set of concepts and relations between them • There are also gene, business (eg NAICS), astronomy ontologies, and more! Ontologies Ex. (Keychain) Animalia (Phzlaw) Chordate - dogs, lions, lizards, fish, . . . (Chores) Mammalia - dogs, lions, elephants (Order) Carnivores - dogs, lions, . . . (Family) Canedae - coyotes, dogs, jackets, wolves (Gems) Canis - dogs, foxes, jackals (Species) Canis lupus - dogs, wolves (Sub species) Canis lupus familiarizes - domestic dogs Beagle Pocket beagle Remy INSTANCE--OF - Gene ontologies - Boseneia ontologies (eg. NAICS) - Astsonomug ontologies IS-A
  • 11. DATA/MSML 641 • Conceived by George Miller as an ontology for English • Has evolved into a broader collection of multilingual WordNets • Distinct ontologies for N, V, Adj, Adv • Core concept: SynonymNet ~ “Concept” • <board, plank, beam> • <board, committee, panel, council> • The noun taxonomy is most widely used • Hyper/hyponym (IS-A) • Instance (INSTANCE-OF) • Meronym (PART-OF) • Antonym (OPPOSITE) WordNet
  • 12. DATA/MSML 641 • Related to the core idea in AI of “inheritance”, also a fundamental role in early semantic networks • C2 IS-A C1 => for all f, f is a property of C1 => f is also a property of C2 • Related to subclassing in object-oriented programming WordNet
  • 13. DATA/MSML 641 scienceacademy.umd.edu Word Sense Disambiguation (WSD) Task scienceacademy.umd.edu
  • 14. DATA/MSML 641 • A classic problem in NLP • Bar-Hillel (1060) argued that fully-automatic high-quality machine translation (FAHQMT) was infeasible because it would require too much world knowledge • “The box is in the pen” requires commonsense reasoning about relative sizes to disambiguate and translate “pen” correctly • Long era of work on small sets of individual words (e.g. line, bank) • Resnik and Yarowsky (1997) created SENSEVAL (later SemEval) • Community-wise shared test for WSD WSD
  • 15. DATA/MSML 641 • Given: • enumerated senses {s1, s2, …, sn} for a word w • context for w (…. w ….) • Select: • “correct” si for w • Strong baseline: Just the most frequent sense! • Good pre-deep-learning baseline: Naive Bayes • Choose argmax_y Pr(y|x) = argmax_y Pr(x|y) WSD as supervised classification
  • 16. DATA/MSML 641 • Strong baseline: Just the most frequent sense! • Good pre-deep-learning baseline: Naive Bayes • Choose • Naively assume all features are independent WSD as supervised classification Multiply independent probabilities Prior (often uniform)
  • 17. DATA/MSML 641 • Other supervised approaches: • SVM with engineered features (See J&M 18.5.1) WSD as supervised classification DT JJ NN CC NN NB An electric guitar and bass player stand . . . Wi-2 Wi-1 Wi Wi+1 Wi+2 Extract feature vector
  • 18. DATA/MSML 641 • Given • … sawed a board in half and … • … nailed together two pine boards … • Create contextual embedding vectors for each sense, and use 1-N- N classifier! Supervised state-of-the-art (as of J+M 18.4.2) …. board …. Minimize cosine distance to Aeleoy BOARDi VBOARD1 VBOARD2 VBOARDN Non-engineered contractor concentration for sense C c1 c2 cn
  • 19. DATA/MSML 641 scienceacademy.umd.edu “I don’t believe in Word Senses” Kilgariff (2003) scienceacademy.umd.edu
  • 20. DATA/MSML 641 • Kilgariff advocated for task-dependent clustering of corpus instances • Schütze: Early distributional representations can form clusters • Current trend is to use distributional representations as meaning • “You shall know a word by the company it keeps” • However, consider: systematic regularities (e.g. lamb is a food or animal?), sparse data (esp for specialized domains), explainability (supporting understanding and trusting inferences) Abandoning enumerated senses
  • 21. DATA/MSML 641 • Low performance - hard problem… • Skew of senses • Most-frequent sense is largely represented by the word itself (Zipf’s law) • “Pen” usually means writing implement for example • One sense per discourse • Implicit disambiguation (bank -> ambiguous, bank … atm … hours -> unambiguous) Why doesn’t traditional WSD help a lot of standard applications?
  • 23. DATA/MSML 641 • Symbolic representations are basically 1-hot encodings Multiply 1xn matrix by nx1 matrix to get a 1x1 result Distributional approaches to meaning Or geometrically, measure cosine similarity
  • 24. DATA/MSML 641 • Often measured as Problem: sim(x,y) = 0 for x!=y if 1-hot! Has trouble generalizing {puppy, dog, pup} Vector Similarity A Wheel Rome Scooter Books whert Harly Stay health Z itk Books whert food • apple • orange
  • 25. DATA/MSML 641 • Often measured as Problem: sim(x,y) = 0 for x!=y if 1-hot! Has trouble generalizing {puppy, dog, pup} Vector Similarity
  • 26. DATA/MSML 641 • Problem: sim(x,y) = 0 for x!=y if 1-hot! Has trouble generalizing {puppy, dog, pup} • Solution: Use weights Vector Similarity Vwi dj = + f i,j . i dfi Freq. of wi in dj inversely proportional to # of dj’s containing wi Ex. If “the” is in qqq0 of dogs, As idf is really low Usually: log (N/ # dccs containing wi) If wi ∈ all doc, idf = 0! Usually:
  • 27. DATA/MSML 641 • Using contexts to define word vectors • count(wi, wj) = # times wi, wj co-occur in a window • Same issue with frequent words, PMI Vector Similarity W1 Wv W1 Wv term - term matrix
  • 28. DATA/MSML 641 • Each word has a very high dimension! This can lead to reduced ability to generalize • What are some dimension reduction techniques we can leverage? • Latent Semantic Indexing (LSI), use Singular Value Decomposition to reduce words/documents Dimensionality reduction M = w m x n c – dices, words Original matric Partially neglected = U ∑ V+ m x m m x n n x n x x 0 0 1 0 Dugual t Truncated SVD to get
  • 29. DATA/MSML 641 • CBOW - simple! Predict target word from context • Skip-gram - a bit more involved, predict context words from target (reverse CBOW) word2vec Wi-2 O Wi-1 O Wi+1 O Wi+2 O O Wi-2
  • 30. DATA/MSML 641 word2vec: skip-grams ● Goal: predict context words for a given target word ● Optimize feature vectors: ○ Intuition: we want context words with target to be high, and random words with target to be low Wi Feature vector Mutuliyal continuously O O O O O Wi-2 (=c1) Wi , Wi-2 + Wi-1 (=c2) Wi , Wi-1 + Wi+1 (=c3) Wi , Wi+1 + Wi+2 (=c4) Wi , Wi+2 + Wi , - Wi , - Wi , - Wi , - L context words, treated as independent Random samples to get negative example
  • 31. DATA/MSML 641 Effect of word2vec ● Words that appear in similar contexts get closer-together feature vectors ● Smaller contexts ○ More syntactic since near-words are more likely to be syntactically related ● Captures structure in semantic spaces (classic man -> king as woman -> queen example) ● Can extend to document-level work (e.g. doc2vec!) king queen woman man