Data Mining to Understand InternationalDimensions to Online Identity- a classification of 2+ billion names andtheir linkag...
Who am I?Education:Kingston University (BSc) - GISUCL (M.Res) - Advanced Spatial Analysis and VisualisationUCL 3+1 - PhD S...
Definitions:• Netnography  – “A qualitative, interpretive research methodology that uses    internet-optimized ethnographi...
Uncertainty of Identity: Work Package 4:    Cybergeodemographics•   Use of primary and secondary data to relate virtual In...
Working Title:• “Data Mining to Understand International Dimensions to  Online Identity - a classification of 2+ billion n...
Work Plan•   M.Res (Present – 2013)     – Foundation work         • Assess representative capability of tweet data     – S...
Data Sources:*Sina Weibo
Case Study: Tweets   in London• 1.4 Million Tweets  over 3 months  Sep - Dec 2012
What’s in a Tweet?First Name                         SurnameUnique ID                        # Themes Location            ...
Data Classification• Gender  – Database of 62000 names + genders  – Determined by Forename• Demographic  – OAC – Output ar...
Data Classification
Tweets by ONOMAP Religion
Tweets by ONOMAP Religion
Tweets by ONOMAP Group
Challenges of Study• Signal from Noise  – Tweets are not all sent from individuals homes     • Day and night demographics ...
Target Areas of Study• Spatio-temporal differentiation of tweets  – Night  – Day  – Travel• Expansion of the Methodology f...
References:•   Dale, M. R. T., and M-J. Fortin. "From graphs to spatial graphs." Annual Review of Ecology,    Evolution, a...
Thank-youX Factor GraphProduced with R and Gephi
Upcoming SlideShare
Loading in …5
×

Phd Colloquium Spatial Analysis

616 views

Published on

Presentation given as part of a PHD Colloquium on Spatial Analysis delivered on Wed 11th January 2013

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
616
On SlideShare
0
From Embeds
0
Number of Embeds
82
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Phd Colloquium Spatial Analysis

  1. 1. Data Mining to Understand InternationalDimensions to Online Identity- a classification of 2+ billion names andtheir linkage to virtual identities andsocial network traffic.• Alistair Leak• UCL SECReT• a.leak.11@ucl.ac.uk
  2. 2. Who am I?Education:Kingston University (BSc) - GISUCL (M.Res) - Advanced Spatial Analysis and VisualisationUCL 3+1 - PhD Security and Crime ScienceSupervisors:1st Supervisor: Professor Paul Longley2nd Supervisor: Dr James Cheshire
  3. 3. Definitions:• Netnography – “A qualitative, interpretive research methodology that uses internet-optimized ethnographic research techniques to study the social context in online communities” (Kozinets,2009)• Cybergeodemographics – “The analysis of people by where they live and by whom they interact with, in real and virtual space” (Longley, 2012)
  4. 4. Uncertainty of Identity: Work Package 4: Cybergeodemographics• Use of primary and secondary data to relate virtual Internet traffic to the probable physical locations from which it emanated; and the development of typologies of social networks that are robust, generalized and related to physical locations. Secondary Data Collection Tools Data (WP1) Cybergeodemographics (WP4) Text Analytics (WP2)
  5. 5. Working Title:• “Data Mining to Understand International Dimensions to Online Identity - a classification of 2+ billion names and their linkage to virtual identities and social network traffic” Objectives:• Develop spatial context of name network classification• Develop typologies of social networks• Measure how representative social media is of the underlying population.
  6. 6. Work Plan• M.Res (Present – 2013) – Foundation work • Assess representative capability of tweet data – Skills Development • Spatio-Temporal Data Mining • Database Management• Ph.D (2013 – 2016) – Objectives • Develop spatial component of names networks • Develop typologies of social networks • Develop a measure of uncertainty – Completion in August 2016
  7. 7. Data Sources:*Sina Weibo
  8. 8. Case Study: Tweets in London• 1.4 Million Tweets over 3 months Sep - Dec 2012
  9. 9. What’s in a Tweet?First Name SurnameUnique ID # Themes Location Possibilities: •Political AffiliationPopularity •Gender •Age •LocationInteractions Time/Date
  10. 10. Data Classification• Gender – Database of 62000 names + genders – Determined by Forename• Demographic – OAC – Output area classifier• ONOMAP – Ethnicity, Religion, Geographical Origin. – Determined by Forename Surname combination
  11. 11. Data Classification
  12. 12. Tweets by ONOMAP Religion
  13. 13. Tweets by ONOMAP Religion
  14. 14. Tweets by ONOMAP Group
  15. 15. Challenges of Study• Signal from Noise – Tweets are not all sent from individuals homes • Day and night demographics – Not all location tweets are real people• Data Quality/Sample Size – Twitter users are self selecting • Only a small proportion have enabled location services • Dataset currently has 92,000 unique users
  16. 16. Target Areas of Study• Spatio-temporal differentiation of tweets – Night – Day – Travel• Expansion of the Methodology for World Names – Initially into Europe.• Application of new name datasets.
  17. 17. References:• Dale, M. R. T., and M-J. Fortin. "From graphs to spatial graphs." Annual Review of Ecology, Evolution, and Systematics 41.1 (2010): 21.• Fischer, E. (July, 2011). World Map of Flikr and Twitter Locations. In See Something or Say Something. Available at http://www.flickr.com/photos/walkingsf/5912169471/in/set-72157627140310742• http://urbantick.blogspot.co.uk/2010/12/ncl-social-networks.html• Kozinets, Robert V. Netnography: Doing ethnographic research online. Sage Publications Limited, 2009.• R Core Team (2012). R: A language and environment for statistical computing. R Foundation for• Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.• Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37-44). ACM.
  18. 18. Thank-youX Factor GraphProduced with R and Gephi

×