Phd Colloquium Spatial Analysis
Upcoming SlideShare
Loading in...5

Phd Colloquium Spatial Analysis



Presentation given as part of a PHD Colloquium on Spatial Analysis delivered on Wed 11th January 2013

Presentation given as part of a PHD Colloquium on Spatial Analysis delivered on Wed 11th January 2013



Total Views
Views on SlideShare
Embed Views



2 Embeds 39 32 7



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Phd Colloquium Spatial Analysis Phd Colloquium Spatial Analysis Presentation Transcript

  • Data Mining to Understand InternationalDimensions to Online Identity- a classification of 2+ billion names andtheir linkage to virtual identities andsocial network traffic.• Alistair Leak• UCL SECReT•
  • Who am I?Education:Kingston University (BSc) - GISUCL (M.Res) - Advanced Spatial Analysis and VisualisationUCL 3+1 - PhD Security and Crime ScienceSupervisors:1st Supervisor: Professor Paul Longley2nd Supervisor: Dr James Cheshire
  • Definitions:• Netnography – “A qualitative, interpretive research methodology that uses internet-optimized ethnographic research techniques to study the social context in online communities” (Kozinets,2009)• Cybergeodemographics – “The analysis of people by where they live and by whom they interact with, in real and virtual space” (Longley, 2012)
  • Uncertainty of Identity: Work Package 4: Cybergeodemographics• Use of primary and secondary data to relate virtual Internet traffic to the probable physical locations from which it emanated; and the development of typologies of social networks that are robust, generalized and related to physical locations. Secondary Data Collection Tools Data (WP1) Cybergeodemographics (WP4) Text Analytics (WP2)
  • Working Title:• “Data Mining to Understand International Dimensions to Online Identity - a classification of 2+ billion names and their linkage to virtual identities and social network traffic” Objectives:• Develop spatial context of name network classification• Develop typologies of social networks• Measure how representative social media is of the underlying population.
  • Work Plan• M.Res (Present – 2013) – Foundation work • Assess representative capability of tweet data – Skills Development • Spatio-Temporal Data Mining • Database Management• Ph.D (2013 – 2016) – Objectives • Develop spatial component of names networks • Develop typologies of social networks • Develop a measure of uncertainty – Completion in August 2016
  • Data Sources:*Sina Weibo
  • Case Study: Tweets in London• 1.4 Million Tweets over 3 months Sep - Dec 2012
  • What’s in a Tweet?First Name SurnameUnique ID # Themes Location Possibilities: •Political AffiliationPopularity •Gender •Age •LocationInteractions Time/Date
  • Data Classification• Gender – Database of 62000 names + genders – Determined by Forename• Demographic – OAC – Output area classifier• ONOMAP – Ethnicity, Religion, Geographical Origin. – Determined by Forename Surname combination
  • Data Classification
  • Tweets by ONOMAP Religion
  • Tweets by ONOMAP Religion
  • Tweets by ONOMAP Group
  • Challenges of Study• Signal from Noise – Tweets are not all sent from individuals homes • Day and night demographics – Not all location tweets are real people• Data Quality/Sample Size – Twitter users are self selecting • Only a small proportion have enabled location services • Dataset currently has 92,000 unique users
  • Target Areas of Study• Spatio-temporal differentiation of tweets – Night – Day – Travel• Expansion of the Methodology for World Names – Initially into Europe.• Application of new name datasets.
  • References:• Dale, M. R. T., and M-J. Fortin. "From graphs to spatial graphs." Annual Review of Ecology, Evolution, and Systematics 41.1 (2010): 21.• Fischer, E. (July, 2011). World Map of Flikr and Twitter Locations. In See Something or Say Something. Available at•• Kozinets, Robert V. Netnography: Doing ethnographic research online. Sage Publications Limited, 2009.• R Core Team (2012). R: A language and environment for statistical computing. R Foundation for• Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL• Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37-44). ACM.
  • Thank-youX Factor GraphProduced with R and Gephi