Spatio-temporal linkage of real andvirtual identity  Muhammad Adnan (and Paul Longley)  University College London
Geodemographics• “Analysis of people by where they live [places]”                                           (Sleight, 1993...
Identity of individuals in the real world• Name (Forename & Surname)• Surnames have geographic concentrations• Prospects f...
An example – gbnames.publicprofiler.org         Longley                  Cheshire
An example – Output Area Classification  Kingston upon Hull          Hereford
A socio-economic and ethnic classification
A socio-economic and ethnic classification
Wu
Source: Cheshire and Longley (2011)
Courtesy: James Cheshire                           12
Wordle.net
The European scale                     16 countries.                     400 million people.                     5.95 mill...
Onomap classification      Forename-Surname clustering        (based on Hanks and Tucker, 2000)                       UK E...
WorldNames CEL clustersSource: Mateos et al (2011)
Uncertainty and virtual identity• Identity increasingly shaped by online activities   – => value may be leveraged from the...
Most of us have virtual identities• Email address; social media accounts• People use different procedures and providers to...
Most of us have virtual identities• Facebook data mining engine  • Analyses the words you use and tailors advertisement   ...
Starting Pointhttp://worldnames.publicprofiler.org• Worldnames holds data for approximately 1 billion  population around 2...
Starting Pointhttp://worldnames.publicprofiler.org• Worldnames has been archiving „Surname search‟,  „Email Address‟, „Gen...
IP Address to Latitude/Longitude conversionhttp://quova.comAn API to convert “IP addresses” to their corresponding  latitu...
IP Address to Latitude/Longitude conversionhttp://quova.comA search for an IP Address in UCL (128.40.214.196)
Top CountriesWebsite was searched from 155 countries over the past 6 months                 UNITED STATES                 ...
UK and Ireland
Europe
North America
South America
India, China, Japan, Singapore
Popular Surname Searches                            SMITH      708                            JONES      306              ...
Popular Email Domains                        GMAIL.COM        31842                        HOTMAIL.COM      22098         ...
Popular Email Domains by SurnamesSmith (English)   Jones (Welsh)    Johnson (English)                  GMAIL.COM        GM...
Popular Email Domains by CountryUK              USA            FranceGMAIL.COM       GMAIL.COM      HOTMAIL.FRHOTMAIL.COM ...
Top GoogleMail.com usersTop SurnamesBINDERWATKINSWHITEWOODSROBINSONSLEEMANBENNETTRITCHIESHARPROLLINGS
GoogleMail.com users• Surname „Binder‟   Germany             Switzerland
GoogleMail.com users• Surname „Binder‟   Germany             Switzerland
GoogleMail.com users• Surname „Blackbourn‟         New Zealand
Who use their surnames as part of their emailaddress • Approximately 40% of the users have their surname   as part of thei...
Who use long email addresses ?• Grand mean average email length of 8 characters   • Number of characters on the left side ...
What else we can infer from email addresses• Internet service provider   •   A.GOODEVE@AOL. COM   •   BERRYMANL@BTINTERNET...
What else we can infer from email addresses• Probable forename of a person   •   BEVERLY.RICHARDS@YAHOO.COM   •   BJORN.SO...
What else we can infer from email addresses• Work Locations  •   DOUG.GOODMAN@FOUNDATION.ORG.UK  •   GRL@KCS.ORG.UK  •   E...
Conclusion and future work• There are some interesting patterns found in the study of  email addresses   •   some problems...
Thanks for ListeningAny Questions ?
A research agenda1 Acquire relevant real and virtual data sources and devise DBMS2 Devise GB-wide classification of NICT u...
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
Upcoming SlideShare
Loading in...5
×

Spatio-temporal linkage of real and virtual identity

4,251

Published on

This presentation outlines the initial work explaining the linkage of identities in the real and virtual worlds worlds.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,251
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • wu
  • Spatio-temporal linkage of real and virtual identity

    1. 1. Spatio-temporal linkage of real andvirtual identity Muhammad Adnan (and Paul Longley) University College London
    2. 2. Geodemographics• “Analysis of people by where they live [places]” (Sleight, 1993:3)• Social similarity, not locational proximity Home Person Address Area
    3. 3. Identity of individuals in the real world• Name (Forename & Surname)• Surnames have geographic concentrations• Prospects for linkage with socio-economic data • E.g. Analysing the socio-economic circumstances of different ethnic groups
    4. 4. An example – gbnames.publicprofiler.org Longley Cheshire
    5. 5. An example – Output Area Classification Kingston upon Hull Hereford
    6. 6. A socio-economic and ethnic classification
    7. 7. A socio-economic and ethnic classification
    8. 8. Wu
    9. 9. Source: Cheshire and Longley (2011)
    10. 10. Courtesy: James Cheshire 12
    11. 11. Wordle.net
    12. 12. The European scale 16 countries. 400 million people. 5.95 million unique surnames Courtesy: James Cheshire
    13. 13. Onomap classification Forename-Surname clustering (based on Hanks and Tucker, 2000) UK Electoral Roll Mateos Pablo Garcia Juan Pérez Forenames Surnames Rosa ... Marta Sánchez ... Rodríguez ... – Several iterations until self-contained cluster is exhausted – Cluster assigned a cultural, ethnic & linguistic Onomap type – Probability of ethnicity assigned to each name Mateos et al (2007) CASA Working Paper 116
    14. 14. WorldNames CEL clustersSource: Mateos et al (2011)
    15. 15. Uncertainty and virtual identity• Identity increasingly shaped by online activities – => value may be leveraged from the fusion of physical and virtual data sources• Data fusion and generalisation to relate physical and virtual properties• Use of residence alongside activity patterns and social network information
    16. 16. Most of us have virtual identities• Email address; social media accounts• People use different procedures and providers to establish virtual identities• Harvesting these data has interesting potential applications • Cyber crime • Cyber geodemographics (Facebook has already started this)
    17. 17. Most of us have virtual identities• Facebook data mining engine • Analyses the words you use and tailors advertisement accordingly
    18. 18. Starting Pointhttp://worldnames.publicprofiler.org• Worldnames holds data for approximately 1 billion population around 28 countries of the world• Approximately 1.6 million unique users have visited the website since 2008
    19. 19. Starting Pointhttp://worldnames.publicprofiler.org• Worldnames has been archiving „Surname search‟, „Email Address‟, „Gender‟, and „IP Address‟ for searches over the past 6 months • c. 175,000 records: email validation • 150,000 usable „IP Address‟ entries
    20. 20. IP Address to Latitude/Longitude conversionhttp://quova.comAn API to convert “IP addresses” to their corresponding latitude / longitude values
    21. 21. IP Address to Latitude/Longitude conversionhttp://quova.comA search for an IP Address in UCL (128.40.214.196)
    22. 22. Top CountriesWebsite was searched from 155 countries over the past 6 months UNITED STATES UNITED KINGDOM 76708 21892 CANADA 8154 GERMANY 7158 ITALY 4058 90000 AUSTRALIA 2978 BRAZIL 2440 80000 FRANCE 2028 ARGENTINA 1958 70000 SPAIN 1830 NEW ZEALAND 1236 60000 NETHERLANDS 1074 50000 GREECE 1040 SWITZERLAND 992 40000 BELGIUM 940 POLAND 880 30000 AUSTRIA 874 MEXICO 834 20000 IRELAND 710 SWEDEN 630 10000 0
    23. 23. UK and Ireland
    24. 24. Europe
    25. 25. North America
    26. 26. South America
    27. 27. India, China, Japan, Singapore
    28. 28. Popular Surname Searches SMITH 708 JONES 306 JOHNSON 258 ANDERSON 224 WILLIAMS 222800 MILLER 218 MARTIN 202700 WILSON 194 BROWN 194 MOORE 188600 THOMAS 178 TAYLOR 170500 CLARK 164 LEE 160 ROBERTS 156400 DAVIS 152 CAMPBELL 144300 LEWIS 138 HARRIS 138 MITCHELL 136200100 0
    29. 29. Popular Email Domains GMAIL.COM 31842 HOTMAIL.COM 22098 YAHOO.COM 1554235000 AOL.COM 5550 COMCAST.NET 269630000 HOTMAIL.CO.UK 1948 MSN.COM 1624 WEB.DE 152225000 YAHOO.CO.UK 1290 GMX.DE 1260 SBCGLOBAL.NET 124620000 BTINTERNET.COM 860 HOTMAIL.IT 84415000 VERIZON.NET 798 GOOGLEMAIL.COM 742 LIVE.COM 74210000 COX.NET 708 ATT.NET 632 5000 MAILINATOR.COM 616 LIBERO.IT 616 0
    30. 30. Popular Email Domains by SurnamesSmith (English) Jones (Welsh) Johnson (English) GMAIL.COM GMAIL.COMGMAIL.COMYAHOO.COM HOTMAIL.COM HOTMAIL.COMHOTMAIL.COM YAHOO.COM YAHOO.COMAOL.COM COMCAST.NET MSN.COMMAILINATOR.COM GOOGLEMAIL.COM VERIZON.NETPerez (Spanish) Gupta (Indian) Meyer (German)GMAIL.COM GMAIL.COM GMAIL.COMHOTMAIL.COM HOTMAIL.COM HOTMAIL.COMYAHOO.ES YAHOO.COM YAHOO.COMCHARTER.NET GOOGLAMAIL.COM AOL.COMGRANDECOM.NET INDIATIMES.COM GMX.DE
    31. 31. Popular Email Domains by CountryUK USA FranceGMAIL.COM GMAIL.COM HOTMAIL.FRHOTMAIL.COM YAHOO.COM GMAIL.COMHOTMAIL.CO.UK HOTMAIL.COM HOTMAIL.COMYAHOO.CO.UK AOL.COM YAHOO.FRYAHOO.COM COMCAST.NET LAPOSTE.NETGermany Brazil JapanWEB.DE HOTMAIL.COM YAHOO.COMGMX.DE GMAIL.COM YAHOO.CO.JPT-ONLINE.DE YAHOO.COM.BR GMAIL.COMYAHOO.DE IG.COM.BR HOTMAIL.COMGMAIL.COM BOL.COM.BR MSN.COM
    32. 32. Top GoogleMail.com usersTop SurnamesBINDERWATKINSWHITEWOODSROBINSONSLEEMANBENNETTRITCHIESHARPROLLINGS
    33. 33. GoogleMail.com users• Surname „Binder‟ Germany Switzerland
    34. 34. GoogleMail.com users• Surname „Binder‟ Germany Switzerland
    35. 35. GoogleMail.com users• Surname „Blackbourn‟ New Zealand
    36. 36. Who use their surnames as part of their emailaddress • Approximately 40% of the users have their surname as part of their email address • abbie.harper@hotmail.com (Surname: Harper) • helmut.kempe@inode.at (Surname: Kempe) • Top Countries 50 45 40 35 30 25 20 15 10 5 0
    37. 37. Who use long email addresses ?• Grand mean average email length of 8 characters • Number of characters on the left side of „@‟ • United Kingdom, USA, Canada, and other European countries• People from South American countries and India have long email addresses (Average length: 13 characters) BRAZIL ANA.ARAUJO3909@CREASP.ORG.BR (14 characters) CHILE BYRON.DELGADO.INOSTROZA@HOTMAIL.COM (25 characters) URUGUAY DIEGOJAVIERZEBALLOS@GMAIL.COM (17 characters) INDIA GANGULYDEEPANJAN@HOTMAIL.COM (18 characters) ARGENTINA AGUSTINAREYNOZO@GMAIL.COM (13 characters)• South Indians have longer email address than North Indians
    38. 38. What else we can infer from email addresses• Internet service provider • A.GOODEVE@AOL. COM • BERRYMANL@BTINTERNET.COM • CARL@VALLEYWISP.NET (Person lives in a rural area of northeast Oregon)• Country of origin • A.HAKIM26@YAHOO.FR • CBARNES@MEDIAWORKS.CO.NZ• Probable temporal aspects • ABBY527@OPTONLINE.NET • BERZINSKY102@YAHOO.COM • C.JOHNSTON2@BTINTERNET.COM
    39. 39. What else we can infer from email addresses• Probable forename of a person • BEVERLY.RICHARDS@YAHOO.COM • BJORN.SOBRY@HOTMAIL.COM • BRANDAN.HOLMES@HOTMAIL.COM• How up to date someone is with technology • ALEXANDER.BREUSCH@GMAIL.COM • WILLIAM.NEALON@GOOGLEMAIL.COM• Professional Affiliations • CHRIS@IEEE.ORG
    40. 40. What else we can infer from email addresses• Work Locations • DOUG.GOODMAN@FOUNDATION.ORG.UK • GRL@KCS.ORG.UK • ERM43@CAM.AC.UK• Studying • RTRIPOLI@STUDENT.UMASS.EDU • CBALIN01@STUDENTS.BBK.AC.UK • KATHERINE.LITTEN@STUDENT.KIRKWOOD.EDU
    41. 41. Conclusion and future work• There are some interesting patterns found in the study of email addresses • some problems (accuracy of geocoding techniques)• Prospect of data linkage of data coded to unit postcode level • cluster analysis and data mining techniques• Future work may involve the data mining of Facebook and Twitter data • issues of generalisation• Visualisation of the data
    42. 42. Thanks for ListeningAny Questions ?
    43. 43. A research agenda1 Acquire relevant real and virtual data sources and devise DBMS2 Devise GB-wide classification of NICT usage at neighbourhood scale3 Devise GB-wide classification of social network traffic4 Develop enhanced worldnames site to harvest real and virtual user data5 Undertake text analysis of worldnames user data and use to link classifications (2) and (3)6 Devise, implement and analyse social networking application and cybergeodemographic classification
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×