Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spatio-temporal linkage of real and virtual identity

5,116 views

Published on

This presentation outlines the initial work explaining the linkage of identities in the real and virtual worlds worlds.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Spatio-temporal linkage of real and virtual identity

  1. 1. Spatio-temporal linkage of real andvirtual identity Muhammad Adnan (and Paul Longley) University College London
  2. 2. Geodemographics• “Analysis of people by where they live [places]” (Sleight, 1993:3)• Social similarity, not locational proximity Home Person Address Area
  3. 3. Identity of individuals in the real world• Name (Forename & Surname)• Surnames have geographic concentrations• Prospects for linkage with socio-economic data • E.g. Analysing the socio-economic circumstances of different ethnic groups
  4. 4. An example – gbnames.publicprofiler.org Longley Cheshire
  5. 5. An example – Output Area Classification Kingston upon Hull Hereford
  6. 6. A socio-economic and ethnic classification
  7. 7. A socio-economic and ethnic classification
  8. 8. Wu
  9. 9. Source: Cheshire and Longley (2011)
  10. 10. Courtesy: James Cheshire 12
  11. 11. Wordle.net
  12. 12. The European scale 16 countries. 400 million people. 5.95 million unique surnames Courtesy: James Cheshire
  13. 13. Onomap classification Forename-Surname clustering (based on Hanks and Tucker, 2000) UK Electoral Roll Mateos Pablo Garcia Juan Pérez Forenames Surnames Rosa ... Marta Sánchez ... Rodríguez ... – Several iterations until self-contained cluster is exhausted – Cluster assigned a cultural, ethnic & linguistic Onomap type – Probability of ethnicity assigned to each name Mateos et al (2007) CASA Working Paper 116
  14. 14. WorldNames CEL clustersSource: Mateos et al (2011)
  15. 15. Uncertainty and virtual identity• Identity increasingly shaped by online activities – => value may be leveraged from the fusion of physical and virtual data sources• Data fusion and generalisation to relate physical and virtual properties• Use of residence alongside activity patterns and social network information
  16. 16. Most of us have virtual identities• Email address; social media accounts• People use different procedures and providers to establish virtual identities• Harvesting these data has interesting potential applications • Cyber crime • Cyber geodemographics (Facebook has already started this)
  17. 17. Most of us have virtual identities• Facebook data mining engine • Analyses the words you use and tailors advertisement accordingly
  18. 18. Starting Pointhttp://worldnames.publicprofiler.org• Worldnames holds data for approximately 1 billion population around 28 countries of the world• Approximately 1.6 million unique users have visited the website since 2008
  19. 19. Starting Pointhttp://worldnames.publicprofiler.org• Worldnames has been archiving „Surname search‟, „Email Address‟, „Gender‟, and „IP Address‟ for searches over the past 6 months • c. 175,000 records: email validation • 150,000 usable „IP Address‟ entries
  20. 20. IP Address to Latitude/Longitude conversionhttp://quova.comAn API to convert “IP addresses” to their corresponding latitude / longitude values
  21. 21. IP Address to Latitude/Longitude conversionhttp://quova.comA search for an IP Address in UCL (128.40.214.196)
  22. 22. Top CountriesWebsite was searched from 155 countries over the past 6 months UNITED STATES UNITED KINGDOM 76708 21892 CANADA 8154 GERMANY 7158 ITALY 4058 90000 AUSTRALIA 2978 BRAZIL 2440 80000 FRANCE 2028 ARGENTINA 1958 70000 SPAIN 1830 NEW ZEALAND 1236 60000 NETHERLANDS 1074 50000 GREECE 1040 SWITZERLAND 992 40000 BELGIUM 940 POLAND 880 30000 AUSTRIA 874 MEXICO 834 20000 IRELAND 710 SWEDEN 630 10000 0
  23. 23. UK and Ireland
  24. 24. Europe
  25. 25. North America
  26. 26. South America
  27. 27. India, China, Japan, Singapore
  28. 28. Popular Surname Searches SMITH 708 JONES 306 JOHNSON 258 ANDERSON 224 WILLIAMS 222800 MILLER 218 MARTIN 202700 WILSON 194 BROWN 194 MOORE 188600 THOMAS 178 TAYLOR 170500 CLARK 164 LEE 160 ROBERTS 156400 DAVIS 152 CAMPBELL 144300 LEWIS 138 HARRIS 138 MITCHELL 136200100 0
  29. 29. Popular Email Domains GMAIL.COM 31842 HOTMAIL.COM 22098 YAHOO.COM 1554235000 AOL.COM 5550 COMCAST.NET 269630000 HOTMAIL.CO.UK 1948 MSN.COM 1624 WEB.DE 152225000 YAHOO.CO.UK 1290 GMX.DE 1260 SBCGLOBAL.NET 124620000 BTINTERNET.COM 860 HOTMAIL.IT 84415000 VERIZON.NET 798 GOOGLEMAIL.COM 742 LIVE.COM 74210000 COX.NET 708 ATT.NET 632 5000 MAILINATOR.COM 616 LIBERO.IT 616 0
  30. 30. Popular Email Domains by SurnamesSmith (English) Jones (Welsh) Johnson (English) GMAIL.COM GMAIL.COMGMAIL.COMYAHOO.COM HOTMAIL.COM HOTMAIL.COMHOTMAIL.COM YAHOO.COM YAHOO.COMAOL.COM COMCAST.NET MSN.COMMAILINATOR.COM GOOGLEMAIL.COM VERIZON.NETPerez (Spanish) Gupta (Indian) Meyer (German)GMAIL.COM GMAIL.COM GMAIL.COMHOTMAIL.COM HOTMAIL.COM HOTMAIL.COMYAHOO.ES YAHOO.COM YAHOO.COMCHARTER.NET GOOGLAMAIL.COM AOL.COMGRANDECOM.NET INDIATIMES.COM GMX.DE
  31. 31. Popular Email Domains by CountryUK USA FranceGMAIL.COM GMAIL.COM HOTMAIL.FRHOTMAIL.COM YAHOO.COM GMAIL.COMHOTMAIL.CO.UK HOTMAIL.COM HOTMAIL.COMYAHOO.CO.UK AOL.COM YAHOO.FRYAHOO.COM COMCAST.NET LAPOSTE.NETGermany Brazil JapanWEB.DE HOTMAIL.COM YAHOO.COMGMX.DE GMAIL.COM YAHOO.CO.JPT-ONLINE.DE YAHOO.COM.BR GMAIL.COMYAHOO.DE IG.COM.BR HOTMAIL.COMGMAIL.COM BOL.COM.BR MSN.COM
  32. 32. Top GoogleMail.com usersTop SurnamesBINDERWATKINSWHITEWOODSROBINSONSLEEMANBENNETTRITCHIESHARPROLLINGS
  33. 33. GoogleMail.com users• Surname „Binder‟ Germany Switzerland
  34. 34. GoogleMail.com users• Surname „Binder‟ Germany Switzerland
  35. 35. GoogleMail.com users• Surname „Blackbourn‟ New Zealand
  36. 36. Who use their surnames as part of their emailaddress • Approximately 40% of the users have their surname as part of their email address • abbie.harper@hotmail.com (Surname: Harper) • helmut.kempe@inode.at (Surname: Kempe) • Top Countries 50 45 40 35 30 25 20 15 10 5 0
  37. 37. Who use long email addresses ?• Grand mean average email length of 8 characters • Number of characters on the left side of „@‟ • United Kingdom, USA, Canada, and other European countries• People from South American countries and India have long email addresses (Average length: 13 characters) BRAZIL ANA.ARAUJO3909@CREASP.ORG.BR (14 characters) CHILE BYRON.DELGADO.INOSTROZA@HOTMAIL.COM (25 characters) URUGUAY DIEGOJAVIERZEBALLOS@GMAIL.COM (17 characters) INDIA GANGULYDEEPANJAN@HOTMAIL.COM (18 characters) ARGENTINA AGUSTINAREYNOZO@GMAIL.COM (13 characters)• South Indians have longer email address than North Indians
  38. 38. What else we can infer from email addresses• Internet service provider • A.GOODEVE@AOL. COM • BERRYMANL@BTINTERNET.COM • CARL@VALLEYWISP.NET (Person lives in a rural area of northeast Oregon)• Country of origin • A.HAKIM26@YAHOO.FR • CBARNES@MEDIAWORKS.CO.NZ• Probable temporal aspects • ABBY527@OPTONLINE.NET • BERZINSKY102@YAHOO.COM • C.JOHNSTON2@BTINTERNET.COM
  39. 39. What else we can infer from email addresses• Probable forename of a person • BEVERLY.RICHARDS@YAHOO.COM • BJORN.SOBRY@HOTMAIL.COM • BRANDAN.HOLMES@HOTMAIL.COM• How up to date someone is with technology • ALEXANDER.BREUSCH@GMAIL.COM • WILLIAM.NEALON@GOOGLEMAIL.COM• Professional Affiliations • CHRIS@IEEE.ORG
  40. 40. What else we can infer from email addresses• Work Locations • DOUG.GOODMAN@FOUNDATION.ORG.UK • GRL@KCS.ORG.UK • ERM43@CAM.AC.UK• Studying • RTRIPOLI@STUDENT.UMASS.EDU • CBALIN01@STUDENTS.BBK.AC.UK • KATHERINE.LITTEN@STUDENT.KIRKWOOD.EDU
  41. 41. Conclusion and future work• There are some interesting patterns found in the study of email addresses • some problems (accuracy of geocoding techniques)• Prospect of data linkage of data coded to unit postcode level • cluster analysis and data mining techniques• Future work may involve the data mining of Facebook and Twitter data • issues of generalisation• Visualisation of the data
  42. 42. Thanks for ListeningAny Questions ?
  43. 43. A research agenda1 Acquire relevant real and virtual data sources and devise DBMS2 Devise GB-wide classification of NICT usage at neighbourhood scale3 Devise GB-wide classification of social network traffic4 Develop enhanced worldnames site to harvest real and virtual user data5 Undertake text analysis of worldnames user data and use to link classifications (2) and (3)6 Devise, implement and analyse social networking application and cybergeodemographic classification

×