SlideShare a Scribd company logo
1 of 52
Towards Modelling Language Innovation Acceptance in
Online Social Networks
24th November 2015
Daniel Kershaw – d.kershaw1@lancaster.ac.uk
Daniel Kershaw
Computer Science BSc – Lancaster University – 2009
Digital Innovation MRes – Highwire – Lancaster University – 2010
PhD Candidate – 2010 – Now
Supervisors
Dr. Matthew Rowe – School of Computing and Communication (SCC)
Dr. Patrick Stacey – Management Science (LUMS)
Research Area:
Social Computing
Big Data / Big Data Systems
Who Am I
“Language, never forget, is more like fashion than
science, and matters of usage, spelling, and
pronunciation tend to wander around like
hemlines”
- Bill Bryson, The Mother Tongue: English and How
It Got That Way
Language is in constant change
Online communication adds extra pressure though the merging of time and
space
– Awesomesauce
– Bants
– beer o’clock
– brain fart
– Brexit
– bruh
Language is Contently Changing
State of the Art - Detection Of Innovation
Three studies
1. Looking for term “this mean”/“is defined as”
2. Using know heuristics of blends to detect origins
3. Detecting changes in semantic orientation or words
Cook, P., & Stevenson, S. (2007)
Cook, P., & Stevenson, S. (2010)
State of the Art - Diffusion
Identify words that exist in small time frame
Model diffusion using monri-carlo simulations
Showed existence of wave and gravity diffusion
models.
Does not detect local community innovations
Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E. P. (2012, October)
State of the Art - Change in meaning
Showed a change in meaning of
words
Performed on Google N-grams
dataset
Removed the concept of community
Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S.
State of the Art - User Language Change
Change in language as a tool to predict
users leaving social network
Initially language conforms to the group
Before they level the language bears away
from the language of the group
Danescu-Niculescu-Mizil C, West R, Jurafsky D, Leskovec J, Potts C
1. Word innovation acceptance models through computation means
2. Identification of local and global acceptance
3. Multiple network analysis
4. Large Corpus Analysis
Contribution
Metcalf’s Fudge
Frequency of the word
Unobtrusiveness of the rod
Diversity of users and situations
Generation of other forms and
meanings
Endurance of the concept
Grounded Models
Barnhart’s Vfrgt
(V) Number of forms
(F) Frequency of word
(R) Number of sources
(G) Number of genera
(T) Time Span of Word
Linguists and lexicographers aim to understand language
Developed heuristics to aid the decision to include words in dictionaries
The Data
Twitter Reddit
Users 3,108,844 ≈25,000,000
Posts 73,528,954 ≈500,000,000
Communities 3046 121,373
Words (n > 200) 373,217 2,712,629
Time Periods 283 days 880 days
Data gathered from the June 2015 Reddit Data Release https://goo.gl/j116ML
Data Groupings
Group by Time
Day of year
Geo Location for Twitter
UK -> North West -> LA -> LA1
Subreddit for Reddit
Reddit -> meta interest group ->
subreddit
Variation in Frequency
Assess changed in raw frequency and user frequency over time
Diversity in Form
Assess users adoption of varying forms e.g. additions of ing
Diversity in Meaning
Over time can we see a convergence in meaning of the word
Measures
• BNC (British National Corpus) – Gold
standard of English
• Filter out:
– Hash tags
– URLs
– Punctuation
– Emoticons / Emoji
• Light Normalization:
– soooooooo -> soo
What is an Innovation
Twitter Reddit
# innovation
(N > 200)
62,141 373,217
• Normalized word count
– Per time period
– Per community
• Normalized user word count
– Per time period
– Per community
Variation in Frequency
Assess the prefix and suffix addition of an
innovation
List of prefix and suffixes from the OED
– apple -> apples
– hero -> antihero
Diversity in Form
Diversity in Meaning
Looking for innovations that have not
been seen before
No solidified meaning within existing
systems e.g. WordNet
Looking for innovations that have not
been seen before
No solidified meaning within existing
systems e.g. WordNet
Learns the embedding of words within a
corpus using word2vec
Developed by Google in 2009
Uses documents to train neural net
Diversity in Meaning - word2vec
User the data grouping; time and location
Train w2v model each split of data e.g.
London, week1
Query model with each innovation against
model (top 100 synonyms) e.g. fleek
Compute similarity between each region in
a time period for an innovation e.g. week 1
fleek
Diversity in Meaning
Looking for statistically significant growth or
decay of an innovation
Presume language change happens in a
monotonic fashion
Fit Spearman's rank to each time series
X value is days since start of data
Y value is normalized frequency of word
Value range -1 to 1
Sampling the Data
Sampling the Data
Class statistically significate change as above
and bellow the 95% confidence interval.
The Tools
Some Processing Later
Variation in Frequency
Variation in Frequency – Samples
Twitter Reddit
Variation in Frequency - Reddit
Variation in Frequency – User vs. Frequency
TwitterReddit
Variation in Frequency – User vs. Frequency
Variation in Frequency - Community
Variation in Frequency - Community
Diversity in Form
Variation in Form
Diversity in Form - Reddit
Diversity in Form - Reddit
• Mistaken Clustering:
– jkt – (Just Keep Thinking)
– Lijkt – (Dutch word for ‘You appear’)
Issues
Diversity in Meaning - Twitter
Diversity in Meaning - Twitter
Diversity in Meaning – Twitter (Ebola)
• Susceptible to excessive usage of a word
• Solution could be:
– Smoothing of data
– Sampling to give equal representation of word
Diversity in Meaning
Word of the Year
Collins
binge-watch, verb
clean eating, noun
contactless, adjective
Corbynomics, noun
dadbod, noun
ghosting, noun
manspreading, noun
shaming, noun
swipe, verb
Transgender, adjective
Oxford
😂
Ad blocker, noun
Brexit, noun
Dark Web, noun
On fleek, adjective phrase
Lumber serxual, noun
Refugee, noun
Sharing economy, noun
They (singular), pronoun
Word of the Year
Word of the Year
Word of the Year
Word of the Year
You can say a lot with Emojis
Word of the Year - Emoji
• Is language dependent on community structure
– Modeling Social Reinforcement and Homophile
– Again modeling on multiple levels and across different networks
• Is diffusion effected by the form of network e.g. geographical Twitter or cros
posting on Reddit
• Who are the most influential in language innovation and adoption
– Fitting of general threshold models to predict when people adopt a term
– How to perform this at scale
Where next
Thank You
Any Questions
Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E. P. (2012, October). Mapping
the geographical diffusion of new words. arXiv.org.
Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E. P. (2014). Diffusion of Lexical
Change in Social Media. PLoS ONE, 9(11), e113114.
http://doi.org/10.1371/journal.pone.0113114
Goldberg, Y., & Levy, O. (2014, February 15). word2vec Explained: deriving
Mikolov et al.'s negative-sampling word-embedding method.
Metcalf, A. A. (2004). Predicting New Words. Houghton Mifflin Harcourt.
Barnhart, D. K. (2007). A Calculus for New Words, 28(1), 132–138.
http://doi.org/10.1353/dic.2007.0009
References
Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S. (2014, November 12).
Statistically Significant Detection of Linguistic Change. arXiv.org. PeerJ Inc.
http://doi.org/10.7717/peerj.68/table-1
Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J., & Potts, C.
(2013). No country for old members: user lifecycle and linguistic change in
online communities (pp. 307–318). International World Wide Web Conferences
Steering Committee.
Cook, P., Han, B., & Baldwin, T. (n.d.). Statistical Methods for Identifying Local
Dialectal Terms from GPS-Tagged Documents.
Cook, P., & Stevenson, S. (2007). Automagically inferring the source words of
lexical blends. Presented at the Proceedings of the Tenth Conference of the ….
References

More Related Content

What's hot

June sustick edtc 661 assignment 5 professional development 1
June sustick edtc 661 assignment 5 professional development 1June sustick edtc 661 assignment 5 professional development 1
June sustick edtc 661 assignment 5 professional development 1
jsustick
 
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Francesco Osborne
 

What's hot (9)

1038
10381038
1038
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 
Second Life & Education
Second Life & EducationSecond Life & Education
Second Life & Education
 
June sustick edtc 661 assignment 5 professional development 1
June sustick edtc 661 assignment 5 professional development 1June sustick edtc 661 assignment 5 professional development 1
June sustick edtc 661 assignment 5 professional development 1
 
Becka603.02 presentationpart3
Becka603.02 presentationpart3Becka603.02 presentationpart3
Becka603.02 presentationpart3
 
Flexible Open Language Education for a MultiLingual World
Flexible Open Language Education for a MultiLingual WorldFlexible Open Language Education for a MultiLingual World
Flexible Open Language Education for a MultiLingual World
 
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
 
Ust bibliography kc lee
Ust bibliography kc leeUst bibliography kc lee
Ust bibliography kc lee
 
eMargin Presentation given to Skills Funding Agency
eMargin Presentation given to Skills Funding AgencyeMargin Presentation given to Skills Funding Agency
eMargin Presentation given to Skills Funding Agency
 

Similar to Towards Modelling Language Innovation Acceptance in Online Social Networks

Exploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech DatasetsExploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech Datasets
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
 
The dynamics of knowledge creation: academics' changing writing practices – i...
The dynamics of knowledge creation: academics' changing writing practices – i...The dynamics of knowledge creation: academics' changing writing practices – i...
The dynamics of knowledge creation: academics' changing writing practices – i...
Queen's University Belfast
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values Study
CESSDA Training
 
Making working thesauri
Making working thesauriMaking working thesauri
Making working thesauri
liddy
 

Similar to Towards Modelling Language Innovation Acceptance in Online Social Networks (20)

Twitter provides a selfie of envolving language
Twitter provides a selfie of envolving languageTwitter provides a selfie of envolving language
Twitter provides a selfie of envolving language
 
Jacco van Ossenbruggen - Detecteren van veranderingen in de betekenis van woo...
Jacco van Ossenbruggen - Detecteren van veranderingen in de betekenis van woo...Jacco van Ossenbruggen - Detecteren van veranderingen in de betekenis van woo...
Jacco van Ossenbruggen - Detecteren van veranderingen in de betekenis van woo...
 
Exploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech DatasetsExploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech Datasets
 
Digital Scholarly Communication @Claremont Colleges
Digital Scholarly Communication @Claremont CollegesDigital Scholarly Communication @Claremont Colleges
Digital Scholarly Communication @Claremont Colleges
 
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...
 
Large-scale norming and statistical analysis of 870 American English idioms.pdf
Large-scale norming and statistical analysis of 870 American English idioms.pdfLarge-scale norming and statistical analysis of 870 American English idioms.pdf
Large-scale norming and statistical analysis of 870 American English idioms.pdf
 
EricRochesterResume
EricRochesterResumeEricRochesterResume
EricRochesterResume
 
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
 
A Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching ResearchA Survey Of Current Datasets For Code-Switching Research
A Survey Of Current Datasets For Code-Switching Research
 
DH_syllabus_typology
DH_syllabus_typologyDH_syllabus_typology
DH_syllabus_typology
 
Domain Scoping for Subject Matter Experts by Elham Khabiri
Domain Scoping for Subject Matter Experts by Elham KhabiriDomain Scoping for Subject Matter Experts by Elham Khabiri
Domain Scoping for Subject Matter Experts by Elham Khabiri
 
ICAME 2010
ICAME 2010ICAME 2010
ICAME 2010
 
Enhancing the Learning Experience with Readspeaker
Enhancing the Learning Experience with ReadspeakerEnhancing the Learning Experience with Readspeaker
Enhancing the Learning Experience with Readspeaker
 
Trends, challenges and developments in technologies that will influence the f...
Trends, challenges and developments in technologies that will influence the f...Trends, challenges and developments in technologies that will influence the f...
Trends, challenges and developments in technologies that will influence the f...
 
The dynamics of knowledge creation: academics' changing writing practices – i...
The dynamics of knowledge creation: academics' changing writing practices – i...The dynamics of knowledge creation: academics' changing writing practices – i...
The dynamics of knowledge creation: academics' changing writing practices – i...
 
Survey on text mining networks
Survey on text mining networksSurvey on text mining networks
Survey on text mining networks
 
Brislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evsBrislinger, Recker: Keeping data re-usable in the evs
Brislinger, Recker: Keeping data re-usable in the evs
 
Alive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values StudyAlive and kicking! Keeping data re-usable in the European Values Study
Alive and kicking! Keeping data re-usable in the European Values Study
 
Making working thesauri
Making working thesauriMaking working thesauri
Making working thesauri
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 

More from Daniel Kershaw

Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020
Daniel Kershaw
 
Building Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationBuilding Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly Information
Daniel Kershaw
 
Monitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social MediaMonitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social Media
Daniel Kershaw
 

More from Daniel Kershaw (6)

Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020
 
Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science Direct
 
Lancaster UCREL Summer School 2017 - Big Data and NLP
Lancaster UCREL Summer School 2017 - Big Data and NLPLancaster UCREL Summer School 2017 - Big Data and NLP
Lancaster UCREL Summer School 2017 - Big Data and NLP
 
Building Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationBuilding Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly Information
 
Twitter and Alcohol - BrightonSEO Pressentation
Twitter and Alcohol - BrightonSEO PressentationTwitter and Alcohol - BrightonSEO Pressentation
Twitter and Alcohol - BrightonSEO Pressentation
 
Monitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social MediaMonitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social Media
 

Recently uploaded

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

Towards Modelling Language Innovation Acceptance in Online Social Networks

  • 1. Towards Modelling Language Innovation Acceptance in Online Social Networks 24th November 2015 Daniel Kershaw – d.kershaw1@lancaster.ac.uk
  • 2. Daniel Kershaw Computer Science BSc – Lancaster University – 2009 Digital Innovation MRes – Highwire – Lancaster University – 2010 PhD Candidate – 2010 – Now Supervisors Dr. Matthew Rowe – School of Computing and Communication (SCC) Dr. Patrick Stacey – Management Science (LUMS) Research Area: Social Computing Big Data / Big Data Systems Who Am I
  • 3. “Language, never forget, is more like fashion than science, and matters of usage, spelling, and pronunciation tend to wander around like hemlines” - Bill Bryson, The Mother Tongue: English and How It Got That Way
  • 4. Language is in constant change Online communication adds extra pressure though the merging of time and space – Awesomesauce – Bants – beer o’clock – brain fart – Brexit – bruh Language is Contently Changing
  • 5. State of the Art - Detection Of Innovation Three studies 1. Looking for term “this mean”/“is defined as” 2. Using know heuristics of blends to detect origins 3. Detecting changes in semantic orientation or words Cook, P., & Stevenson, S. (2007) Cook, P., & Stevenson, S. (2010)
  • 6. State of the Art - Diffusion Identify words that exist in small time frame Model diffusion using monri-carlo simulations Showed existence of wave and gravity diffusion models. Does not detect local community innovations Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E. P. (2012, October)
  • 7. State of the Art - Change in meaning Showed a change in meaning of words Performed on Google N-grams dataset Removed the concept of community Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S.
  • 8. State of the Art - User Language Change Change in language as a tool to predict users leaving social network Initially language conforms to the group Before they level the language bears away from the language of the group Danescu-Niculescu-Mizil C, West R, Jurafsky D, Leskovec J, Potts C
  • 9. 1. Word innovation acceptance models through computation means 2. Identification of local and global acceptance 3. Multiple network analysis 4. Large Corpus Analysis Contribution
  • 10. Metcalf’s Fudge Frequency of the word Unobtrusiveness of the rod Diversity of users and situations Generation of other forms and meanings Endurance of the concept Grounded Models Barnhart’s Vfrgt (V) Number of forms (F) Frequency of word (R) Number of sources (G) Number of genera (T) Time Span of Word Linguists and lexicographers aim to understand language Developed heuristics to aid the decision to include words in dictionaries
  • 11. The Data Twitter Reddit Users 3,108,844 ≈25,000,000 Posts 73,528,954 ≈500,000,000 Communities 3046 121,373 Words (n > 200) 373,217 2,712,629 Time Periods 283 days 880 days Data gathered from the June 2015 Reddit Data Release https://goo.gl/j116ML
  • 12. Data Groupings Group by Time Day of year Geo Location for Twitter UK -> North West -> LA -> LA1 Subreddit for Reddit Reddit -> meta interest group -> subreddit
  • 13. Variation in Frequency Assess changed in raw frequency and user frequency over time Diversity in Form Assess users adoption of varying forms e.g. additions of ing Diversity in Meaning Over time can we see a convergence in meaning of the word Measures
  • 14. • BNC (British National Corpus) – Gold standard of English • Filter out: – Hash tags – URLs – Punctuation – Emoticons / Emoji • Light Normalization: – soooooooo -> soo What is an Innovation Twitter Reddit # innovation (N > 200) 62,141 373,217
  • 15. • Normalized word count – Per time period – Per community • Normalized user word count – Per time period – Per community Variation in Frequency
  • 16. Assess the prefix and suffix addition of an innovation List of prefix and suffixes from the OED – apple -> apples – hero -> antihero Diversity in Form
  • 17. Diversity in Meaning Looking for innovations that have not been seen before No solidified meaning within existing systems e.g. WordNet
  • 18. Looking for innovations that have not been seen before No solidified meaning within existing systems e.g. WordNet Learns the embedding of words within a corpus using word2vec Developed by Google in 2009 Uses documents to train neural net Diversity in Meaning - word2vec
  • 19. User the data grouping; time and location Train w2v model each split of data e.g. London, week1 Query model with each innovation against model (top 100 synonyms) e.g. fleek Compute similarity between each region in a time period for an innovation e.g. week 1 fleek Diversity in Meaning
  • 20. Looking for statistically significant growth or decay of an innovation Presume language change happens in a monotonic fashion Fit Spearman's rank to each time series X value is days since start of data Y value is normalized frequency of word Value range -1 to 1 Sampling the Data
  • 21. Sampling the Data Class statistically significate change as above and bellow the 95% confidence interval.
  • 25. Variation in Frequency – Samples Twitter Reddit
  • 27. Variation in Frequency – User vs. Frequency TwitterReddit
  • 28. Variation in Frequency – User vs. Frequency
  • 29. Variation in Frequency - Community
  • 30. Variation in Frequency - Community
  • 33. Diversity in Form - Reddit
  • 34. Diversity in Form - Reddit
  • 35. • Mistaken Clustering: – jkt – (Just Keep Thinking) – Lijkt – (Dutch word for ‘You appear’) Issues
  • 36. Diversity in Meaning - Twitter
  • 37. Diversity in Meaning - Twitter
  • 38. Diversity in Meaning – Twitter (Ebola)
  • 39. • Susceptible to excessive usage of a word • Solution could be: – Smoothing of data – Sampling to give equal representation of word Diversity in Meaning
  • 40.
  • 41. Word of the Year Collins binge-watch, verb clean eating, noun contactless, adjective Corbynomics, noun dadbod, noun ghosting, noun manspreading, noun shaming, noun swipe, verb Transgender, adjective Oxford 😂 Ad blocker, noun Brexit, noun Dark Web, noun On fleek, adjective phrase Lumber serxual, noun Refugee, noun Sharing economy, noun They (singular), pronoun
  • 42. Word of the Year
  • 43. Word of the Year
  • 44. Word of the Year
  • 45. Word of the Year
  • 46.
  • 47. You can say a lot with Emojis
  • 48. Word of the Year - Emoji
  • 49. • Is language dependent on community structure – Modeling Social Reinforcement and Homophile – Again modeling on multiple levels and across different networks • Is diffusion effected by the form of network e.g. geographical Twitter or cros posting on Reddit • Who are the most influential in language innovation and adoption – Fitting of general threshold models to predict when people adopt a term – How to perform this at scale Where next
  • 51. Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E. P. (2012, October). Mapping the geographical diffusion of new words. arXiv.org. Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E. P. (2014). Diffusion of Lexical Change in Social Media. PLoS ONE, 9(11), e113114. http://doi.org/10.1371/journal.pone.0113114 Goldberg, Y., & Levy, O. (2014, February 15). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. Metcalf, A. A. (2004). Predicting New Words. Houghton Mifflin Harcourt. Barnhart, D. K. (2007). A Calculus for New Words, 28(1), 132–138. http://doi.org/10.1353/dic.2007.0009 References
  • 52. Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S. (2014, November 12). Statistically Significant Detection of Linguistic Change. arXiv.org. PeerJ Inc. http://doi.org/10.7717/peerj.68/table-1 Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J., & Potts, C. (2013). No country for old members: user lifecycle and linguistic change in online communities (pp. 307–318). International World Wide Web Conferences Steering Committee. Cook, P., Han, B., & Baldwin, T. (n.d.). Statistical Methods for Identifying Local Dialectal Terms from GPS-Tagged Documents. Cook, P., & Stevenson, S. (2007). Automagically inferring the source words of lexical blends. Presented at the Proceedings of the Tenth Conference of the …. References

Editor's Notes

  1. LANCASTER UNIVERSITY POWERPOINT TEMPLATE (16:9) These PowerPoint templates are for use by all University staff. Please see below for further information regarding the use of these templates. Should you have any further queries, please contact the marketing team via webmaster@lancaster.ac.uk Template slide 3: Insert a new slide If you need to insert a new slide, from the ‘home’ toolbar, click on ‘new slide’ and select from the templates the style you require from the dropdown box. Template slide 4: Typing new text and copying text from another document New text should be typed over the text in the appropriate template. Copy and pasting text from another document will result in changing the style of the typography and layout. This is unavoidable as it is part of the Microsoft software. We appreciate that in sometimes you will need to copy text from another document into this template. Once you have pasted the existing text into the template, you will need to change the formatting so that they typefaces, sizes, colour, line spacing and alignment are consistent with the rest of the template. Template slide 5: Inserting images There are three choices of templates with images already inserted. Please use the template with the relevant image size and positioning. To insert an image, please go to ‘insert’ then ‘picture’ and find your image, highlight it and ‘insert’. Resize the image and position as per the example template. Template slide 6: Text boxes If a text box is deleted, either insert a new slide (using the appropriate template) or go to another slide and copy a text box. To select a text box for copying, please click on the outer edge of the text box so that the line goes solid (not dashed). Right click your mouse and select ‘copy’, then go back and ‘paste’ it into the slide where the text box is missing which should paste into the correct position on the slide. Template slide 7: Other information Typefaces, sizes and colours All copy is Calibri. Slide title copy throughout: Size: 28 point Colour Lancaster University red: (RGB) R: 181 G: 18 B: 27 (recent colours on PowerPoint) Small copy on first and last slide: Size: 16 point Colour grey: (RGB) R: 102 G: 102 B: 102 (recent colours on PowerPoint) Sub-headings: Size: 20 point – italics Colour grey: (RGB) R: 102 G: 102 B: 102 (recent colours on PowerPoint) Bullets copy and body copy: Body Copy and first bullet: Size: 20 point (Second level bullet 19pt, third level bullet 18pt, forth level bullet 17pt, fifth level bullet 16pt) Colour grey: (RGB) R: 102 G: 102 B: 102 (recent colours on PowerPoint) Line spacing and alignment Slide titles have a line spacing of - 0.8pt Body copy has single line spacing All text is aligned left Slide title options There are two options for titles on the slides – one line title (Slide 9) or two line title (Slide 1) for longer titles. Ideally, the one line title should be used, however on rare occasions a two line title maybe needed.
  2. Blends - We use linguis- tic and cognitive aspects of this process to motivate a computational treatment of neologisms formed by blending. Variation of multaial information messues Variation was asses though computing again a variation in PMI in corrolaition with search engin query return results, issues are that the seach engin is a blacl box,
  3. Mention POS tagging, change point detection, Applied to different networks, distribution of the usage of the word
  4. Buzzfeed Gamergate Sjw Tumblr Bruh Buzfeed Multireddit Remindme Sjw