Phd Colloquium Spatial Analysis

Data Mining to Understand International
Dimensions to Online Identity
- a classification of 2+ billion names and
their linkage to virtual identities and
social network traffic.

• Alistair Leak
• UCL SECReT
• a.leak.11@ucl.ac.uk

Who am I?
Education:
Kingston University (BSc) - GIS
UCL (M.Res) - Advanced Spatial Analysis and Visualisation
UCL 3+1 - PhD Security and Crime Science

Supervisors:
1st Supervisor: Professor Paul Longley
2nd Supervisor: Dr James Cheshire

Definitions:
• Netnography
– “A qualitative, interpretive research methodology that uses
internet-optimized ethnographic research techniques to study the
social context in online communities” (Kozinets,2009)

• Cybergeodemographics
– “The analysis of people by where they live and by whom they
interact with, in real and virtual space” (Longley, 2012)

Uncertainty of Identity: Work Package 4:
Cybergeodemographics
• Use of primary and secondary data to relate virtual Internet traffic to the
probable physical locations from which it emanated; and the development
of typologies of social networks that are robust, generalized and related to
physical locations.

Secondary
Data Collection Tools Data
(WP1)
Cybergeodemographics
(WP4)
Text Analytics
(WP2)

Working Title:
• “Data Mining to Understand International Dimensions to
Online Identity - a classification of 2+ billion names and
their linkage to virtual identities and social network traffic”

Objectives:
• Develop spatial context of name network classification
• Develop typologies of social networks
• Measure how representative social media is of the
underlying population.

Work Plan
• M.Res (Present – 2013)
– Foundation work
• Assess representative capability of tweet data
– Skills Development
• Spatio-Temporal Data Mining
• Database Management

• Ph.D (2013 – 2016)
– Objectives
• Develop spatial component of names networks
• Develop typologies of social networks
• Develop a measure of uncertainty
– Completion in August 2016

Case Study: Tweets
in London

• 1.4 Million Tweets
over 3 months
Sep - Dec 2012

What’s in a Tweet?

First Name
Surname
Unique ID
# Themes
Location
Possibilities:
•Political Affiliation
Popularity •Gender
•Age
•Location

Interactions

Time/Date

Data Classification
• Gender
– Database of 62000 names + genders
– Determined by Forename
• Demographic
– OAC – Output area classifier
• ONOMAP
– Ethnicity, Religion, Geographical Origin.
– Determined by Forename Surname combination

Challenges of Study

• Signal from Noise
– Tweets are not all sent from individuals homes
• Day and night demographics
– Not all location tweets are real people
• Data Quality/Sample Size
– Twitter users are self selecting
• Only a small proportion have enabled location services
• Dataset currently has 92,000 unique users

Target Areas of Study

• Spatio-temporal differentiation of tweets
– Night
– Day
– Travel
• Expansion of the Methodology for World Names
– Initially into Europe.
• Application of new name datasets.

References:
• Dale, M. R. T., and M-J. Fortin. "From graphs to spatial graphs." Annual Review of Ecology,
Evolution, and Systematics 41.1 (2010): 21.
• Fischer, E. (July, 2011). World Map of Flikr and Twitter Locations. In See Something or Say
Something. Available at http://www.flickr.com/photos/walkingsf/5912169471/in/set-72157627140310742
• http://urbantick.blogspot.co.uk/2010/12/ncl-social-networks.html
• Kozinets, Robert V. Netnography: Doing ethnographic research online. Sage Publications Limited,
2009.
• R Core Team (2012). R: A language and environment for statistical computing. R Foundation for
• Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
• Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes
in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated
contents (pp. 37-44). ACM.

Thank-you

X Factor Graph
Produced with R and Gephi

Phd Colloquium Spatial Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Phd Colloquium Spatial Analysis

Similar to Phd Colloquium Spatial Analysis (20)

Recently uploaded

Recently uploaded (20)

Phd Colloquium Spatial Analysis