Uncertainty of Identity: Classifying Twitter Data

516 views

Published on

This presentation proposes the methods of classifying Twitter Data. There has been a tremendous rise in the growth of online social networks all over the world in recent times. Here we present the analysis performed on the Twitter data to identify the aspects of cultural and ethnic identity.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
516
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Uncertainty of Identity: Classifying Twitter Data

  1. 1. Uncertainty of Identity: Classifying TwitterDataMuhammad Adnan (and Prof. Paul Longley)University College London
  2. 2. Uncertainty of Identity: Project Aims• A combined project between UCL, City University, and University of Birmingham• Combining real and virtual world datasets to better understand the identity of individuals • Real world datasets (Surname data, socio-economic datasets) • Virtual world datasets (Email addresses, Social media accounts)My research interests• Data mining• Analysis of Twitter data• Visualisation of the data
  3. 3. Twitter (www.twitter.com)• Online social-networking and micro blogging service• Was launched in 2006. After 6 years, Twitter has 500 million active users.• Generates 350 million tweets daily• One of the top 10 most visited websites on the internet• Twitter API can be used to download live tweets
  4. 4. Twitter API’s data• User Creation Date • Geo Enabled• Followers • Latitude• Friends • Longitude• User ID • Tweet date and time• Language • Tweet text• Location• Name• Screen Name• Time Zone
  5. 5. Classifying Twitter Data to ethnic origins• User Creation Date • Geo Enabled• Followers • Latitude• Friends • Longitude• User ID • Tweet date and time• Language • Tweet text• Location• Name• Screen Name• Time Zone
  6. 6. Classifying Twitter Data to ethnic origins• Some examples of NAME variations on Twitter Real Names Fake NamesKevin Hodge Castor 5.Andre Alves WHAT IS LOVE?Jose de Franco MysticMindCarolina Thomas, Dr. KIRILL_aka_KIDProf. Martha Del Val VanessaFabíola Sanchez Fernandes Petuna
  7. 7. Top Twitter Users
  8. 8. Where they tweet from:
  9. 9. Where they tweet from:
  10. 10. Where they tweet from:
  11. 11. Classifying Twitter Data to ethnic origins• Applied ONOMAP (www.onomap.org) on FORENAME + SURNAME pairs Kevin Hodge (ENGLISH) Andre de Franco (ITALIAN) … … … …
  12. 12. Twitter Ethnicity Maps
  13. 13. Twitter Ethnicity Maps
  14. 14. Twitter Ethnicity Maps
  15. 15. Twitter Ethnicity Maps
  16. 16. Twitter Ethnicity Maps
  17. 17. Twitter Ethnicity Maps
  18. 18. Twitter Ethnicity Mapshttp://www.guardian.co.uk/news/datablog/
  19. 19. Which places they are talking about ?• Tweets containing ‘London’ in their text string• Applying text matching algorithms to remove tweets contain places which are not London e.g. London Road or London, Ontaio London
  20. 20. Which places they are talking about ? New York
  21. 21. Which places they are talking about ? Madrid
  22. 22. Twitter Language Maps
  23. 23. Twitter Language Maps
  24. 24. Twitter Language Maps
  25. 25. Conclusion• Use of social media is increasing day by day• Social-media datasets can give an insight into people’s behaviour in virtual worlds• Investigation of ethnicity origins in other countries to establish inferences on migration trends in developed and developing countries• Future work will involve the investigation of Four Square and Facebook data
  26. 26. Thank you for ListeningAny Questions ?Web: http://www.uncertaintyofidentity.comEmail: m.adnan@ucl.ac.ukTwitter: @gisandtech

×