SlideShare a Scribd company logo
Using Digital Traces for User Profiling: the Uncertainty 
of Identity Toolset 
Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul 
Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 
1 Department of Geography, University College London 
2 School of Computer Science, University of Birmingham 
3 School of Engineering and Mathematical Sciences, City University London 
Web: www.uncertaintyofidentity.com
Introduction 
• Past years have witnessed a rapid growth of the use of 
online services 
• Online shopping, bank transactions, social networking services 
• Issues related to cyber-crimes, identity frauds, and hacking 
• This project aims to combining real and virtual world 
datasets to better understand the identity of individuals 
• Identities 
• Real world (Name: Forename & Surname) 
• Virtual world (Email addresses, Social media accounts etc)
Introduction 
• This paper presents a framework for the identification and 
profiling of individuals from their 
• Social media accounts 
• E-mail addresses 
• Twitter Geographic Profiler 
• Maps ethno-cultural communities of a person’s friends 
• E-mail Address Profiler 
• Used a database of family names to extract probably identities from 
E-mail addresses 
• Could have potential applications in targeted marketing and 
online fraud detection
Outline 
• Onomap 
• A Name (Forename and Surname) classification system 
• Twitter Geographic Profiler 
• Extracting identities of Twitter users 
• Mapping them to probable ethnic origins 
• E-mail Address Profiler 
• Extracting identities from E-mail addresses 
• Geographic distribution
Onomap classification 
• A name is a person’s ethnic, linguistic, and cultural identity 
• A network of Forename-Surname pairs was created by using 
Pablo 
Forenames Surnames 
Mateos 
Garcia 
Pérez 
... 
Juan 
Rosa 
Marta 
... 
Sánchez 
Rodríguez 
the data from 26 different countries 
• www.onomap.org 
Name: Pablo Mateos
Onomap Classification
Onomap Classification 
• ONOMAP (www.onomap.org) for forename – surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Twitter Geographic Profiler
Twitter Geographic Profiler 
• Given an individual’s Twitter Username or ID 
• Extracts the information of individual’s friends 
• Extracts the forename-surname pairs of the friends 
• Maps forename-surname pairs to Onomap 
• Builds an ethno-cultural profile person’s friends 
• Maps the geographic distribution
Data available through the Twitter API 
• User ID 
• User Creation Date 
• Followers 
• Friends 
• Language 
• Location 
• Name 
• Screen Name or User Name 
• Time Zone 
• Geo Enabled 
• Latitude 
• Longitude 
• Tweet date and time 
• Tweet text
Twitter: getting the ids and usernames 
• Given a Twitter username of a person, we use the Twitter 
API to get the list of friends’ ids 
– A max of 15 requests every 15 minutes is allowed 
– Each query can get up to 5000 ids 
– Generally enough to download all the ids 
• Using the ids, we fetch the name associated to each id 
– Limited to 180 requests every 15 min 
– Returns a single string from which we need to extract the name 
and surname tokens 
– Not necessarily a valid forename + surname! 
• E.g., “University of Birmingham”, “John1965”, “ What is Love”, 
“Mystic_mind”
Twitter: getting forename-surname pairs 
• Name field was divided into different tokens 
• Forenames and Surnames were detected by matching the 
string tokens against the database of forename surnames 
pairs of 26 countries 
• Users discarded 
– where tokens were not matched against valid forename and 
surname
Onomap: from names to ethnicity 
• ONOMAP (www.onomap.org) was applied on forename – 
surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Friends’ Ethnicity Histogram 
GEOGRAPHIC PROFILER 
cultural communities of a 
determine the distribution 
groups of the friends of a 
integrate information from two 
Note, that the same ideas 
other Online Social 
Foursquare1. However, 
around different and 
Foursquare’s venues. In this 
because of the general 
not restricted to a specific 
Facebook, information is 
username of the person being 
surname, forename) pairs of 
of names to a list of 
classification of Onomap. 
probable countries of 
estimate respectively the 
set of possible ethno-cultural 
countries. In the following 
details of the tool and 
terms of users' privacy. 
Twitter is directed, in the 
necessarily reciprocated. 
associated with each user, 
following and one for the 
Figure 1: Screenshot of the Twitter Geographic Profiler. The 
bottom part of the screen shows the histogram of the Twitter 
user's friends ethno-cultural groups. 
Once the entire list of friends name + surname pairs has been parsed, we can 
easily estimate the distribution over the set of possible ethno-cultural groups of 
the Twitter user's friends 
her followers. In this 
representing the list of a user's 
actually follow a limited number of profiles, which are then 
accessible even with the rate limitation in place. 
With the list of (surname, forename) pairs to hand, we query 
Onomap to get the ethno-cultural classification associated with 
each (surname, forename) pair, and the 
SearchSurnameTopCountries method to get the list of the 
countries where an instance of a given surname was observed.
pair among the extracted tokens. In this work we mark as invalid 
any string that is composed of a single token. If this is the case, 
we skip the profile of the corresponding friend. 
Friends’ Geographic Origins 
Map showing the geographic origin of the Twitter user's friends’ surnames as 
assigned by our tool. Below the map the user is shown a list of the top 10 
countries with the respective frequency. 
If the string contains two or more tokens, we take the first one to 
be the forename and the last one to be the surname. Moreover, 
when a (surname, forename) pair is sent to Onomap, an error 
distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. 
However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. 
Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter 
ethno-cultural profiling. 
user's friends’ surnames as assigned by our tool. Below the 
map the user is shown a list of the top 10 countries with the 
As for the limitations of the respective frequency. 
we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
Twitter Geographic Profiler 
• Potential applications include 
– Measure the level of segregation/integration of a given individual 
(community) as the Shannon entropy of the (average) friends’ 
ethnicity histogram 
– Outliers detection: identify uncommon behaviors, e.g., individuals 
that stand out in terms of the ethno-cultural groups they bond with 
• Limitations 
– Twitter data is very noisy 
– We need a better heuristic to extract forename + surname
E-mail Address Profiler
E-mail address profiler 
• In many instances, an e-mail address encapsulates some 
kind of identity information 
– Forename or surname 
• This tool 
– Extracts identities of individuals from their e-mail addresses 
– Maps the geographical distribution of a Surname in the UK 
• The tool identifies surname or forename as substring in an 
email address 
• Tool builds a suffix tree of an e-mail address and searches 
for probable identities
An example suffix tree 
Suffix Tree for a name aamalam$. The surname for this name is alam$ 
and it has been shown at a leaf node
Surname matching algorithm 
• Surname matching algorithm constructs a suffix tree for an 
email address. 
• Uses a database of surnames and forenames and matches 
them 
– with each substring of the suffix tree 
• A probable identity is the substring where a surname or 
forename matches with the substring 
• We use a database of the most common 10,000 surnames 
in the UK
E-mail Address Profiler: geographic distribution 
• 2007 Electoral Register 
– Name and Address of every individual who is eligible to vote in 
the UK 
• Every postcode in the Electoral Register was converted 
to latitude/longitude values 
• The tool maps all the latitude/longitudes for a particular 
surname geographically 
• Onomap is used to identify the probable ethnic origin of 
a surname
E-mail Address Profiler 
Email: a.singleton@ucl.ac.uk
Geographic distribution 
Surname: Singleton Surname: Keay
Conclusion 
• A toolkit for identity detection and profiling 
• Identification and profiling of ethno-cultural characteristics of 
individuals 
• From Social media accounts and e-mail address 
• Future work will include 
• The extension of Twitter Geographic Profiler for other social media 
services 
• The extension of E-mail address profiler to process a large corpus of 
e-mail address 
• Study of privacy implications on social media services
Thanks for Listening 
Any Questions ?

More Related Content

Similar to Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013
Leonid Zhukov
 
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Aseel Addawood
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To Essa
Julie Potts
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
Pete Burnap
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free Ess
Leonard Goudy
 
01 Network Data Collection
01 Network Data Collection01 Network Data Collection
01 Network Data Collection
Duke Network Analysis Center
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)
Tin180 VietNam
 
Duke talk
Duke talkDuke talk
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
IRJET Journal
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's Number
Marcello Tomasini
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Reference
kslovesbooks
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
Institute of Contemporary Sciences
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
Julie Roest
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing Pape
Evelyn Donaldson
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
1crore projects
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanities
librarianrafia
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
reenarocky
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
sodhi3
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)
Lora Aroyo
 

Similar to Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset (20)

ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013
 
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To Essa
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free Ess
 
01 Network Data Collection
01 Network Data Collection01 Network Data Collection
01 Network Data Collection
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)
 
Duke talk
Duke talkDuke talk
Duke talk
 
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep LearningAn Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
An Analytical Survey on Hate Speech Recognition through NLP and Deep Learning
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's Number
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Reference
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing Pape
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanities
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)
 

More from Dr Muhammad Adnan

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
Dr Muhammad Adnan
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media users
Dr Muhammad Adnan
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
Dr Muhammad Adnan
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtods
Dr Muhammad Adnan
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
Dr Muhammad Adnan
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
Dr Muhammad Adnan
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographics
Dr Muhammad Adnan
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
Dr Muhammad Adnan
 

More from Dr Muhammad Adnan (8)

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media users
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtods
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographics
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 

Recently uploaded

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

  • 1. Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 1 Department of Geography, University College London 2 School of Computer Science, University of Birmingham 3 School of Engineering and Mathematical Sciences, City University London Web: www.uncertaintyofidentity.com
  • 2. Introduction • Past years have witnessed a rapid growth of the use of online services • Online shopping, bank transactions, social networking services • Issues related to cyber-crimes, identity frauds, and hacking • This project aims to combining real and virtual world datasets to better understand the identity of individuals • Identities • Real world (Name: Forename & Surname) • Virtual world (Email addresses, Social media accounts etc)
  • 3. Introduction • This paper presents a framework for the identification and profiling of individuals from their • Social media accounts • E-mail addresses • Twitter Geographic Profiler • Maps ethno-cultural communities of a person’s friends • E-mail Address Profiler • Used a database of family names to extract probably identities from E-mail addresses • Could have potential applications in targeted marketing and online fraud detection
  • 4. Outline • Onomap • A Name (Forename and Surname) classification system • Twitter Geographic Profiler • Extracting identities of Twitter users • Mapping them to probable ethnic origins • E-mail Address Profiler • Extracting identities from E-mail addresses • Geographic distribution
  • 5. Onomap classification • A name is a person’s ethnic, linguistic, and cultural identity • A network of Forename-Surname pairs was created by using Pablo Forenames Surnames Mateos Garcia Pérez ... Juan Rosa Marta ... Sánchez Rodríguez the data from 26 different countries • www.onomap.org Name: Pablo Mateos
  • 7. Onomap Classification • ONOMAP (www.onomap.org) for forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 9. Twitter Geographic Profiler • Given an individual’s Twitter Username or ID • Extracts the information of individual’s friends • Extracts the forename-surname pairs of the friends • Maps forename-surname pairs to Onomap • Builds an ethno-cultural profile person’s friends • Maps the geographic distribution
  • 10. Data available through the Twitter API • User ID • User Creation Date • Followers • Friends • Language • Location • Name • Screen Name or User Name • Time Zone • Geo Enabled • Latitude • Longitude • Tweet date and time • Tweet text
  • 11. Twitter: getting the ids and usernames • Given a Twitter username of a person, we use the Twitter API to get the list of friends’ ids – A max of 15 requests every 15 minutes is allowed – Each query can get up to 5000 ids – Generally enough to download all the ids • Using the ids, we fetch the name associated to each id – Limited to 180 requests every 15 min – Returns a single string from which we need to extract the name and surname tokens – Not necessarily a valid forename + surname! • E.g., “University of Birmingham”, “John1965”, “ What is Love”, “Mystic_mind”
  • 12. Twitter: getting forename-surname pairs • Name field was divided into different tokens • Forenames and Surnames were detected by matching the string tokens against the database of forename surnames pairs of 26 countries • Users discarded – where tokens were not matched against valid forename and surname
  • 13. Onomap: from names to ethnicity • ONOMAP (www.onomap.org) was applied on forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 14. Friends’ Ethnicity Histogram GEOGRAPHIC PROFILER cultural communities of a determine the distribution groups of the friends of a integrate information from two Note, that the same ideas other Online Social Foursquare1. However, around different and Foursquare’s venues. In this because of the general not restricted to a specific Facebook, information is username of the person being surname, forename) pairs of of names to a list of classification of Onomap. probable countries of estimate respectively the set of possible ethno-cultural countries. In the following details of the tool and terms of users' privacy. Twitter is directed, in the necessarily reciprocated. associated with each user, following and one for the Figure 1: Screenshot of the Twitter Geographic Profiler. The bottom part of the screen shows the histogram of the Twitter user's friends ethno-cultural groups. Once the entire list of friends name + surname pairs has been parsed, we can easily estimate the distribution over the set of possible ethno-cultural groups of the Twitter user's friends her followers. In this representing the list of a user's actually follow a limited number of profiles, which are then accessible even with the rate limitation in place. With the list of (surname, forename) pairs to hand, we query Onomap to get the ethno-cultural classification associated with each (surname, forename) pair, and the SearchSurnameTopCountries method to get the list of the countries where an instance of a given surname was observed.
  • 15. pair among the extracted tokens. In this work we mark as invalid any string that is composed of a single token. If this is the case, we skip the profile of the corresponding friend. Friends’ Geographic Origins Map showing the geographic origin of the Twitter user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the respective frequency. If the string contains two or more tokens, we take the first one to be the forename and the last one to be the surname. Moreover, when a (surname, forename) pair is sent to Onomap, an error distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter ethno-cultural profiling. user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the As for the limitations of the respective frequency. we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
  • 16. Twitter Geographic Profiler • Potential applications include – Measure the level of segregation/integration of a given individual (community) as the Shannon entropy of the (average) friends’ ethnicity histogram – Outliers detection: identify uncommon behaviors, e.g., individuals that stand out in terms of the ethno-cultural groups they bond with • Limitations – Twitter data is very noisy – We need a better heuristic to extract forename + surname
  • 18. E-mail address profiler • In many instances, an e-mail address encapsulates some kind of identity information – Forename or surname • This tool – Extracts identities of individuals from their e-mail addresses – Maps the geographical distribution of a Surname in the UK • The tool identifies surname or forename as substring in an email address • Tool builds a suffix tree of an e-mail address and searches for probable identities
  • 19. An example suffix tree Suffix Tree for a name aamalam$. The surname for this name is alam$ and it has been shown at a leaf node
  • 20. Surname matching algorithm • Surname matching algorithm constructs a suffix tree for an email address. • Uses a database of surnames and forenames and matches them – with each substring of the suffix tree • A probable identity is the substring where a surname or forename matches with the substring • We use a database of the most common 10,000 surnames in the UK
  • 21. E-mail Address Profiler: geographic distribution • 2007 Electoral Register – Name and Address of every individual who is eligible to vote in the UK • Every postcode in the Electoral Register was converted to latitude/longitude values • The tool maps all the latitude/longitudes for a particular surname geographically • Onomap is used to identify the probable ethnic origin of a surname
  • 22. E-mail Address Profiler Email: a.singleton@ucl.ac.uk
  • 23. Geographic distribution Surname: Singleton Surname: Keay
  • 24. Conclusion • A toolkit for identity detection and profiling • Identification and profiling of ethno-cultural characteristics of individuals • From Social media accounts and e-mail address • Future work will include • The extension of Twitter Geographic Profiler for other social media services • The extension of E-mail address profiler to process a large corpus of e-mail address • Study of privacy implications on social media services
  • 25. Thanks for Listening Any Questions ?