SlideShare a Scribd company logo
1 of 46
Download to read offline
Drifting Distributions?

Possibilities and Risk of Using Distributional Semantics for studying
Concept Drift
Drift-a-LOD, Amsterdam 11 September 2017

Antske Fokkens
Drifting Distributions
• What is concept drift?

• What is distributional semantics?

• How can distributional semantics be used?

• How can it not be used?
Concept Drift
• Aspects of a concept (Wang et al. 2011):

• the intension
• the extension
• the label
Intension
• Frege's sense: the sense provides a function that takes
you to the extension of the concept and a perspective on
the denoted concept
Extension
• The set of things that are defined by the intension in the
world (what is being denoted)
Labels
• The words that are used to refer to something
Meaning
Evening StarMorning Star
Concept Drift
• Concept drift for scholars (following Fokkens et al. 2016):

• typically, changes in intension (perspective), where the
core meaning stays the same (Kuukanen, 2008)

• changes in extension can be relevant for specific
concepts (e.g. EU) or in extreme cases
Distributional Semantics
The meaning of a word is determined by its usage
(Wittgenstein)
=> words used in a similar context will have
similar meaning (Harris 1954; Firth 1957)
Distributional Semantics
• A bottle of tesguïno is on the table.
• Everybody likes tesguïno.
• Tesguïno makes you drunk.
• We make tesguïno out of corn.



(Jurafsky and Martin, 2015)
Vector Semantics
• Distributional semantics approaches generally
represent words as vectors
Co-occurrence vectors
Jurafsky and Matrin (2015)
Jurafsky and Matrin (2015)
Jurafsky and Matrin (2015)
Beyond counting co-
occurrence
• Relatedness vs similarity:
• text/paragraph or sentence as context =>
words related to the same topic
• small window of close terms as context => less
general relatedness, more similarity
• Relevance:
• frequent co-occurrence with the or wicket
Jurafsky and Matrin (2015)
Jurafsky and Matrin (2015)
Distributional Semantics for
concept drift
• Distributional semantics can provide insight into
the relation between label and intension
• Used for detecting change in meaning (sense
shift)
• Can this also be used for detecting concept
drift?
2 open questions
• How to go beyond sense shift?
• What is the reliability of the method?
Concept systems
• Betti and Van den Berg (2014): concepts should
not be examined in isolation
• Geeraerts (p.c.): change in the concept itself is
best examined by investigating related concepts
Reliability
• How reliable or indicative are measures of
change?
• How reliable are distributional models?
Word Embeddings
• Used for detecting diachronic change. Common
approach: changes in n-nearest neighbors



word2vec
• Hellrich & Hahn (2016): n-nearest neighbors
change when running word2vec on the same
corpus
Count vs Predict
• Baroni et al (2014): Predictive models work
better than count models
• Levy et al (2015): if you use same
hyperparameters: count is better for similarity,
predict for analogy
How does this hold up to the (in)stability of
word2vec?
Models tested
• Optimal settings from Levy et al. for models:
• PPMI
• SVD
• word2vec:
• 3 random initiations
• svd initiation
• look at examples in different order (beginning to end &
end to beginning)
Experimental Setup
• Wikipedia dump Jan 2017:
• 1.8 Billion words, randomized
• Subset (by taking head and tail) of

0.12M, 5M, 15M, 50M, 100M, 200M, 300M, 400M,
500M, 750M words
• Standard evaluation sets:
• Similarity/relatedness pairs
• Analogy evaluation
Similarity evaluation
Use case:
• Can distributional semantics provide insight into
the ways in which racism changed in the 20th
century? (Sommerauer 2017)
Theory
• Sociology, social psychology and anthropology:



Classical open racism had declined towards the
end of the 20th century and replaced by a more
subtle form of discrimination



a cultural, ethnical explanation rather than
biological
Method
• Concepts to study:
• Core concepts: race, ethnicity, culture

(racial, ethnic, cultural)
• Subconcepts: language, nationality, religion

(linguistic, national, religious)
• Instances: blacks, whites, foreigners,
immigrants, Jews, Arabs, Turks
Related concepts
• How do these relate to:

• difference, conflict, superiority
• history/historic, genetics/genetic
• relation, relationship, marriage
• value, belief, attitude
Corpora
Corpus Composition Corpus size
COHA genre-balanced
22.5 M - 27.9 M words

Average: 24.5 M words
Google N-grams ALL
Google books of all
genres, not evenly
balanced
11.6 B - 82.5 B words

Average: 29 B words
Google N-gram Fiction Google books fiction
925 M - 11.3 B words 

Average: 3.0 B words
Nearest Neighbor
Comparison
• How do the nearest neighbors of a concept
change from one decade to another?
• How does the overlap in nearest neighbors
change between two related concepts?
• within a decade?
• between decade 1 and decade 2
Changes in relations
• How closely are various concepts related?
• How does this change over time?
Reliability check
• Compare different measurements
• Compare effect on target words to effects on
control words
Changes in frequency
• Sommerauer (2017), p. 66
Changing relations
N-gram results
• Sommerauer (2017), p. 71
Conclusions
• Relations between concepts partially changed as
expected
• Nearest neighbor show clear confirmation of racial
shifting to a concept mainly associated with
discrimination
• ethnic and religious move from only `racial’ to
`racial&cultural’
• political moved from `racial’
Instances & control word
• Sommerauer (2017), p. 83
Relational variations based
on alternative models
Nearest Neighbors
depending on various models
Conclusions (revised)
• Control words show that changes in relations between
core concepts did not yield reliable insights in this study
• Fluctuation in results depending on the model that was
used shows that care must be taken when interpreting
relational results
• One insights seem to hold across variation:
• racism moved from something seen as similar to
`ethnic’ and `cultural’ to something mainly associated
with discrimination
Using distributional
semantics for concept drift?
• Possible, but:
• translating the concept under investigation to measurable
tokens is non-trivial
• careful for artifacts of (small) data
• use a solid methodological setup:
• control terms
• alternative measures for testing hypotheses
• alternative models for representing data
References
• Baroni, Marco, Georgina Fina and German Kruzewski (2014) Don’t count, predict! A systematic comparison of context-
counting vs. concept-predicting semantics vectors. In: Proceedings of ACL, Baltimore, USA.
• Betti, Arianna and Hein van den Berg (2014) Modelling the history of ideas. In: British Journal for the History of Philosophy,
22(4), P. 812-835.
• Firth, John R. (1957) A synopsis of linguistic theory 1930-1955. Studies in linguistic analysis.
• Fokkens, Antske, Serge ter Braake, Isa Maks and Davide Colin (2016). On the Semantics of Concept Drift: Towards Formal
Definitions of Semantic Change. In: Proceedings of Drift-a-LOD. Bologna, Italy.
• Harris, Zellig S (1954). Distributional Structure. Word 10(2-3). p 146-162.
• Hellrich, Johannes and Udo Hahn (2016) Bad Company-Neighborhoods in Neural Embedding Spaces Considered Harmful.
In: Proceedings of COLING, Osaka, Japan.
• Jurafsky, Daniel and James Martin (2015) Speech and language processing. Preprint of third edition. https://
web.stanford.edu/~jurafsky/slp3/ed3book.pdf
• Kuukkanen, Jouni (2008) Making sense of conceptual change. In: History and theory 47.3 p. 351-372.
• Levy, Omer, Yoav Goldberg and Ido Dagan (2015). Improving distributional similarity with lessons learned from word
embeddings. Transactions of the Association for Computational Linguistics, 3, p.211-225.
• Sommerauer, Pia (2017) From old to new racism? Investigating known dangers in distributional semantic approaches to
conceptual change. MA thesis. Vrije Universiteit, Amsterdam
• Wang, Shenghui, Stefan Schlobach and Michel Klein (2011): Concept drift and how to identify it. Web Semantics: Science
Services and Agents on the World Wide Web 9.3 (2011) p 247-265.

More Related Content

Similar to Drifting distributions? Possibilities and risks of using distributional semantics for studying concept drift

601-CriticalEssay-2-Portfolio Edition
601-CriticalEssay-2-Portfolio Edition601-CriticalEssay-2-Portfolio Edition
601-CriticalEssay-2-Portfolio EditionJordan Chapman
 
lecture 4 Quantitative research design.ppt
lecture 4 Quantitative research design.pptlecture 4 Quantitative research design.ppt
lecture 4 Quantitative research design.pptAbdallahAlasal1
 
Evaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesEvaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesMark Billinghurst
 
Applying A Validity Argument To The Viva
Applying A Validity Argument To The VivaApplying A Validity Argument To The Viva
Applying A Validity Argument To The VivaAudrey Britton
 
Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...
Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...
Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...Naomi Baes
 
quantitative research designs - PR2 SHS1
quantitative research designs - PR2 SHS1quantitative research designs - PR2 SHS1
quantitative research designs - PR2 SHS1JosuaGarcia5
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftLaura Hollink
 
Outline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docxOutline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docxalfred4lewis58146
 
Kmo2015 national culture_km20150826
Kmo2015 national culture_km20150826Kmo2015 national culture_km20150826
Kmo2015 national culture_km20150826Jan Pawlowski
 
S5. qualitative #2 2019
S5. qualitative #2 2019S5. qualitative #2 2019
S5. qualitative #2 2019collierdr709
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Beyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research ImpactBeyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research ImpactClaire Stewart
 
doore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalbdoore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalbStacy Doore
 
Large-scale norming and statistical analysis of 870 American English idioms.pdf
Large-scale norming and statistical analysis of 870 American English idioms.pdfLarge-scale norming and statistical analysis of 870 American English idioms.pdf
Large-scale norming and statistical analysis of 870 American English idioms.pdfFaishaMaeTangog
 
5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?DoctoralNet Limited
 
Discourse vs.Text.ppt
Discourse vs.Text.pptDiscourse vs.Text.ppt
Discourse vs.Text.pptudianaUQ
 
Examining "Borrowed Theory" in Original vs. New Disciplines via Text Mining
Examining "Borrowed Theory" in Original vs. New Disciplines via Text MiningExamining "Borrowed Theory" in Original vs. New Disciplines via Text Mining
Examining "Borrowed Theory" in Original vs. New Disciplines via Text MiningStephen Downing
 

Similar to Drifting distributions? Possibilities and risks of using distributional semantics for studying concept drift (20)

601-CriticalEssay-2-Portfolio Edition
601-CriticalEssay-2-Portfolio Edition601-CriticalEssay-2-Portfolio Edition
601-CriticalEssay-2-Portfolio Edition
 
lecture 4 Quantitative research design.ppt
lecture 4 Quantitative research design.pptlecture 4 Quantitative research design.ppt
lecture 4 Quantitative research design.ppt
 
Evaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesEvaluation Methods for Social XR Experiences
Evaluation Methods for Social XR Experiences
 
Applying A Validity Argument To The Viva
Applying A Validity Argument To The VivaApplying A Validity Argument To The Viva
Applying A Validity Argument To The Viva
 
Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...
Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...
Naomi Baes's PhD Confirmation Presentation: A Multidimensional Framework for ...
 
quantitative research designs - PR2 SHS1
quantitative research designs - PR2 SHS1quantitative research designs - PR2 SHS1
quantitative research designs - PR2 SHS1
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
 
Outline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docxOutline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docx
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
A tool for discourse visualization and analysis
A tool for discourse visualization and analysisA tool for discourse visualization and analysis
A tool for discourse visualization and analysis
 
Kmo2015 national culture_km20150826
Kmo2015 national culture_km20150826Kmo2015 national culture_km20150826
Kmo2015 national culture_km20150826
 
S5. qualitative #2 2019
S5. qualitative #2 2019S5. qualitative #2 2019
S5. qualitative #2 2019
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Beyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research ImpactBeyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research Impact
 
doore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalbdoore dissertation grad expo 42716 white finalb
doore dissertation grad expo 42716 white finalb
 
Large-scale norming and statistical analysis of 870 American English idioms.pdf
Large-scale norming and statistical analysis of 870 American English idioms.pdfLarge-scale norming and statistical analysis of 870 American English idioms.pdf
Large-scale norming and statistical analysis of 870 American English idioms.pdf
 
CMSS FIVE
CMSS FIVECMSS FIVE
CMSS FIVE
 
5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?5. Do you understand how to fine tune your methodological choices?
5. Do you understand how to fine tune your methodological choices?
 
Discourse vs.Text.ppt
Discourse vs.Text.pptDiscourse vs.Text.ppt
Discourse vs.Text.ppt
 
Examining "Borrowed Theory" in Original vs. New Disciplines via Text Mining
Examining "Borrowed Theory" in Original vs. New Disciplines via Text MiningExamining "Borrowed Theory" in Original vs. New Disciplines via Text Mining
Examining "Borrowed Theory" in Original vs. New Disciplines via Text Mining
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Drifting distributions? Possibilities and risks of using distributional semantics for studying concept drift

  • 1. Drifting Distributions?
 Possibilities and Risk of Using Distributional Semantics for studying Concept Drift Drift-a-LOD, Amsterdam 11 September 2017
 Antske Fokkens
  • 2. Drifting Distributions • What is concept drift? • What is distributional semantics? • How can distributional semantics be used? • How can it not be used?
  • 3. Concept Drift • Aspects of a concept (Wang et al. 2011): • the intension • the extension • the label
  • 4. Intension • Frege's sense: the sense provides a function that takes you to the extension of the concept and a perspective on the denoted concept
  • 5. Extension • The set of things that are defined by the intension in the world (what is being denoted)
  • 6. Labels • The words that are used to refer to something
  • 8. Concept Drift • Concept drift for scholars (following Fokkens et al. 2016): • typically, changes in intension (perspective), where the core meaning stays the same (Kuukanen, 2008) • changes in extension can be relevant for specific concepts (e.g. EU) or in extreme cases
  • 9. Distributional Semantics The meaning of a word is determined by its usage (Wittgenstein) => words used in a similar context will have similar meaning (Harris 1954; Firth 1957)
  • 10. Distributional Semantics • A bottle of tesguïno is on the table. • Everybody likes tesguïno. • Tesguïno makes you drunk. • We make tesguïno out of corn.
 
 (Jurafsky and Martin, 2015)
  • 11. Vector Semantics • Distributional semantics approaches generally represent words as vectors
  • 15. Beyond counting co- occurrence • Relatedness vs similarity: • text/paragraph or sentence as context => words related to the same topic • small window of close terms as context => less general relatedness, more similarity • Relevance: • frequent co-occurrence with the or wicket
  • 18. Distributional Semantics for concept drift • Distributional semantics can provide insight into the relation between label and intension • Used for detecting change in meaning (sense shift) • Can this also be used for detecting concept drift?
  • 19. 2 open questions • How to go beyond sense shift? • What is the reliability of the method?
  • 20. Concept systems • Betti and Van den Berg (2014): concepts should not be examined in isolation • Geeraerts (p.c.): change in the concept itself is best examined by investigating related concepts
  • 21. Reliability • How reliable or indicative are measures of change? • How reliable are distributional models?
  • 22. Word Embeddings • Used for detecting diachronic change. Common approach: changes in n-nearest neighbors
 

  • 23. word2vec • Hellrich & Hahn (2016): n-nearest neighbors change when running word2vec on the same corpus
  • 24. Count vs Predict • Baroni et al (2014): Predictive models work better than count models • Levy et al (2015): if you use same hyperparameters: count is better for similarity, predict for analogy How does this hold up to the (in)stability of word2vec?
  • 25. Models tested • Optimal settings from Levy et al. for models: • PPMI • SVD • word2vec: • 3 random initiations • svd initiation • look at examples in different order (beginning to end & end to beginning)
  • 26. Experimental Setup • Wikipedia dump Jan 2017: • 1.8 Billion words, randomized • Subset (by taking head and tail) of
 0.12M, 5M, 15M, 50M, 100M, 200M, 300M, 400M, 500M, 750M words • Standard evaluation sets: • Similarity/relatedness pairs • Analogy evaluation
  • 28.
  • 29. Use case: • Can distributional semantics provide insight into the ways in which racism changed in the 20th century? (Sommerauer 2017)
  • 30. Theory • Sociology, social psychology and anthropology:
 
 Classical open racism had declined towards the end of the 20th century and replaced by a more subtle form of discrimination
 
 a cultural, ethnical explanation rather than biological
  • 31. Method • Concepts to study: • Core concepts: race, ethnicity, culture
 (racial, ethnic, cultural) • Subconcepts: language, nationality, religion
 (linguistic, national, religious) • Instances: blacks, whites, foreigners, immigrants, Jews, Arabs, Turks
  • 32. Related concepts • How do these relate to:
 • difference, conflict, superiority • history/historic, genetics/genetic • relation, relationship, marriage • value, belief, attitude
  • 33. Corpora Corpus Composition Corpus size COHA genre-balanced 22.5 M - 27.9 M words
 Average: 24.5 M words Google N-grams ALL Google books of all genres, not evenly balanced 11.6 B - 82.5 B words Average: 29 B words Google N-gram Fiction Google books fiction 925 M - 11.3 B words Average: 3.0 B words
  • 34. Nearest Neighbor Comparison • How do the nearest neighbors of a concept change from one decade to another? • How does the overlap in nearest neighbors change between two related concepts? • within a decade? • between decade 1 and decade 2
  • 35. Changes in relations • How closely are various concepts related? • How does this change over time?
  • 36. Reliability check • Compare different measurements • Compare effect on target words to effects on control words
  • 37. Changes in frequency • Sommerauer (2017), p. 66
  • 40. Conclusions • Relations between concepts partially changed as expected • Nearest neighbor show clear confirmation of racial shifting to a concept mainly associated with discrimination • ethnic and religious move from only `racial’ to `racial&cultural’ • political moved from `racial’
  • 41. Instances & control word • Sommerauer (2017), p. 83
  • 42. Relational variations based on alternative models
  • 44. Conclusions (revised) • Control words show that changes in relations between core concepts did not yield reliable insights in this study • Fluctuation in results depending on the model that was used shows that care must be taken when interpreting relational results • One insights seem to hold across variation: • racism moved from something seen as similar to `ethnic’ and `cultural’ to something mainly associated with discrimination
  • 45. Using distributional semantics for concept drift? • Possible, but: • translating the concept under investigation to measurable tokens is non-trivial • careful for artifacts of (small) data • use a solid methodological setup: • control terms • alternative measures for testing hypotheses • alternative models for representing data
  • 46. References • Baroni, Marco, Georgina Fina and German Kruzewski (2014) Don’t count, predict! A systematic comparison of context- counting vs. concept-predicting semantics vectors. In: Proceedings of ACL, Baltimore, USA. • Betti, Arianna and Hein van den Berg (2014) Modelling the history of ideas. In: British Journal for the History of Philosophy, 22(4), P. 812-835. • Firth, John R. (1957) A synopsis of linguistic theory 1930-1955. Studies in linguistic analysis. • Fokkens, Antske, Serge ter Braake, Isa Maks and Davide Colin (2016). On the Semantics of Concept Drift: Towards Formal Definitions of Semantic Change. In: Proceedings of Drift-a-LOD. Bologna, Italy. • Harris, Zellig S (1954). Distributional Structure. Word 10(2-3). p 146-162. • Hellrich, Johannes and Udo Hahn (2016) Bad Company-Neighborhoods in Neural Embedding Spaces Considered Harmful. In: Proceedings of COLING, Osaka, Japan. • Jurafsky, Daniel and James Martin (2015) Speech and language processing. Preprint of third edition. https:// web.stanford.edu/~jurafsky/slp3/ed3book.pdf • Kuukkanen, Jouni (2008) Making sense of conceptual change. In: History and theory 47.3 p. 351-372. • Levy, Omer, Yoav Goldberg and Ido Dagan (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, p.211-225. • Sommerauer, Pia (2017) From old to new racism? Investigating known dangers in distributional semantic approaches to conceptual change. MA thesis. Vrije Universiteit, Amsterdam • Wang, Shenghui, Stefan Schlobach and Michel Klein (2011): Concept drift and how to identify it. Web Semantics: Science Services and Agents on the World Wide Web 9.3 (2011) p 247-265.