User Behaviour Pattern Recognition On Twitter Social Network


Published on

For my final year project I used data analysis techniques to investigate user behavior pattern recognition in respect of similar interests and culture versus offline geographical location. This was an out-of-the-box topic, which I selected due to my love on Data Analysis, in respect of the Social Network Analysis in the Internet era.

Published in: Education, Technology
1 Comment
  • Many thanks to putting me on to this, George. We should talk on the motivations for clicking on a link in a hashtagged tweet, retweeting due to hashtag(s), and so on.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

User Behaviour Pattern Recognition On Twitter Social Network

  1. 1. User Behavior Pattern RecognitionUsingData Analysis TechniquesOnTwitter Social NetworkGeorge KonstantakopoulosSupervisor: George Siogkas
  2. 2. • Network analysis - online social networks• Demonstrate data analysis techniques• Twitter a broadcast ‘area’• Analyze advertised ‘packets’ in the broadcast area• Investigate affected nodes by description & location‘Similar interests & culture are more important thangeographical location in the context of internet era?’Abstract
  3. 3. IntroductionInternet Era Facts• More than 2,4B Internet Users – 34.3% of the World Population (June ‘12)• During 2012 the Digital World generated Almost 2.9 ZB of Data• Directly connected and affected from web 2.0 & 3.0 technologies*Zettabyte = Gigabyte x 1012
  4. 4. IntroductionSocial Web Statistics• 67% of those Users use any Social Networking Site• Facebook, Google+, YouTube, Twitter are currently the leaders*In million of users
  5. 5. Introduction• By acquiring a Twitter network dataset• By creating a Graph based on the dataset• By clustering based on recognized patternsWhat:• @username / description / location• Follow (directed graph)• Update (hashtag/link/photo/video)• Reply or Mention (@username)• Updates - hashtags• Location / descriptionFocus on:Why:• Directed Broadcast Network Topology• 2012 Q3 to Q4 fastest growing Social Platform by 40%• Reflected in 288m active users
  6. 6. Literature ReviewCombining Ideas‘Learning to Discover Social Circles in Ego Networks’1. Influenced on the way an Ego-network can be explored & clustered‘Socio-semantic Query Expansion Using Twitter Hashtags’2. Influenced on the way a hash tag may be used#tag:The hash symbol Followed by aWord or Concatenated phrasee.g.: #truestory
  7. 7. 3. Influenced to research on users location‘Geographic Dissection of the Twitter Network’‘Does offline geography still matter in online social networks?’Break down on Twitter Research:• User’s Geo Location• Their connections to others• Information they exchange with themConcluded:‘Our in-depth analysis reveals that geography crucially impacts all aspectsof the Twitter social network’.Literature Review
  8. 8. Methodology• Twitter API• Data mining procedure• All data are publicly shared• Cluster by hashtags, description &locationData Mining & AnalysisPhysician John Snow in 1854 during the‘Broad Street cholera outbreak’, recognized patternsand created clusters. Water pumps were the diseasesource. Convinced the city authorities to close thepumps and solved the problem.*API stands for:Application Programming Interface
  9. 9. MethodologyMeasure impact by:• Author’s updates repetitive hashtags• Followers Description Analysis (160ch. descriptive ‘biography’)• Investigate repetitive wordsMeasure expansion by:• Followers location cloud• Number of followers differenceImpact & Expansion
  10. 10. DesignCreated• Graphs• Ego-networksSocial Network ExplorationACHFA=AcademiaC=CareerF=FamilyH=HobbySocial Cycles=Vertices(People)=Ego-network=H C=A C=A HAppendix
  11. 11. ATL Atlanta | LA Atlanta, GA, USAATL? NY ? FL ? WORLDWIDE Atlanta and Fort Lauderdale Atlanta, Ga.ATLby way of West Philly Atlanta GA ATLANTA, GA.Atl shawty Atlanta Ga. Atlanta, GA.Atl. Atlanta Georgia Atlanta, Ga. ?ATL/NY/NJ/ Atlanta Georgia Area Atlanta, GeorgiaAtlanta Atlanta Headquarters Atlanta, Georgia (Gwinnett)atlanta Atlanta Nightlife! Atlanta, Georgia, USAATLANTA Atlanta via Hampton Roads Atlanta, Los AngelesAtlanta -- DC Atlanta, DC, & International Atlanta, New YorkAtlanta - London - Tokyo Atlanta, GA Atlanta, New York, Los AngelesAtlanta - sometimes Houston Atlanta, Ga Atlanta,GAAtlanta & New York City ATLANTA, GA Atlanta,GaAtlanta (Soufside) atlanta, ga ATLANTA,GAATLanta , ga Atlanta, GA and Sarasota, FL Atlanta,GeorgiaAtlanta , GA Atlanta, GA Area atlanta. georgia. u.s.aAtlanta /Global Atlanta, GA USA Atlanta/Ghana/Africa/WorldwideAtlanta | Brasil | NYC Atlanta, GA, U.S.A. Atlanta/New York USAAtlantas West MidtownATTRIBUTE Location - REGION: Atlanta• The ‘Atlanta Problem’• Data Cleaning neededData ElaborationDesign55 different inputs of the location Atlanta in this table
  12. 12. Different people have different writing habits:In order to tackle the problem one cluster for the region ‘USA’was created and all other location data were clustered by country.Same problem applies on the description of the followers.The ‘Atlanta Problem’Design
  13. 13. ImplementationImplementation CyclesBased on ‘The Spiral Model of Software Development’ the project came through threedifferent cycles:1. The Twitter Project (1,9Billion lines)2. The Ego-network Project (Already cleaned & clustered)3. The Data mining Project (Authors ego-network analysis)Data pre-processingIntel Core2duo 2.66Ghz CPU Load:100% | Kernel peak 98% Data pre-processingUsers IDs relational graphTotal Lines:Author’s ego-network
  14. 14. ImplementationImplementation Tools• NodeXL• Matlab• Gephi• Microsoft Excel
  15. 15. ImplementationCleaning / ClusteringFrom chaos: To meaningful hashtag clustering:Author’s entire hashtag cloud*word size reflect weightAuthor’s Level 2.0 data mining
  16. 16. Test, Results & EvaluationExperimental Clustering & VisualizationCluster by ‘Clauset - Newman-Moore’Level 2.0 data miningCluster by ‘Louvain method’Graph of 540 nodes & 2570 edgesLevel 1.0 data mining
  17. 17. ResultsAuthors updates hashtag cloud: Followers description word cloud:Test, Results & EvaluationNote that: A)word size reflect weight B)Line thickness reflect connection weight
  18. 18. Results1. Parent node is based in Greece2. Most ‘affected’ nodes are inUSA, followed byUK, Canada, Greece, etc.3. Empty value can affect thediffusion on the small returns.Test, Results & Evaluation*Dot size reflect number of incoming edges
  19. 19. Evaluation - SEOmozTest, Results & EvaluationGeographical expansion - January 2013
  20. 20. Evaluation - SEOmozTest, Results & EvaluationGeographical expansion results are verified - May 2013
  21. 21. Evaluation – SEOmoz & TweetstatsTest, Results & EvaluationFrom January to MayDays: 119Tweets: 333Followers growth: 163%January 2013 May 2013
  22. 22. Test, Results & EvaluationFrom January to MayDays: 119Tweets: 333Followers growth: 163%Evaluation – SEOmoz & Tweetstats
  23. 23. ConclusionResearch on:• Network analysis• Online social networks• Specific Twitter characteristicsRaised the issue:‘Similar interests & culture are more important thangeographical location?’Based on the analysis undertaken :‘Shared interests & culture play a greater role on connectingpeople via the twitter medium than their geographic location.’!Small dataset
  24. 24. THANK YOU FOR YOUR TIMEConclusionPersonal Reflection• Network analysis• Social network analysis• Project management• Research skills• Data analysis• Visualization skillsFuture Work• Project in ongoing state• User categorization through created metrics• Evaluate results based on same analysis butwith different accounts
  25. 25. Final Year Project in numbers• 6,286 files• In 135 folders or…• 59.5GB of data and counting…• Explored over 10 different SW programs in dataanalysis, processing and visualization fieldUser Behavior Pattern Recognition Using Data Analysis Techniques On Twitter Social Network
  26. 26. ReferencesAnagnostopoulos, I., Kolias, V. and Mylonas, P. (2012). Socio-semantic Query Expansion Using Twitter Hashtags. In: 2012 Seventh InternationalWorkshop on Semantic and Social Media Adaptation and Personalization (SMAP), 2012. [Online]. Available at: doi:10.1109/SMAP.2012.15.Bastian, M., Heymann, S. and Jacomy, M. (2009). Gephi: An Open Source Software for Exploring and Manipulating Networks. In: Third InternationalAAAI Conference on Weblogs and Social Media, 19 March 2009. [Online]. Available at:, M. and Brenner, J. (n.d.). The Demographics of Social Media Users - 2012. Pew Internet & American Life Project. [Online]. Available at: [Accessed: 6 March 2013].GlobalWebIndex. (2012). SOCIAL PLATFORMS GWI.8 UPDATE: Decline of Local Social Media Platforms. GlobalWebIndex. [Online]. Available at: [Accessed: 15 March 2013].J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012.Kulshrestha, J., Kooti, F., Nikravesh, A. and Gummadi, K. P. (2012). Geographic Dissection of the Twitter Network. Dublin, Ireland: Max PlanckInstitute for Software Systems.Miniwatts Marketing Group. (n.d.). World Internet Users Statistics | Usage and World Population Stats. Internet World Stats. [Online]. Available at: [Accessed: 6 February 2013].Smith, M. A., Shneiderman, B., Milic-Frayling, N., Mendes Rodrigues, E., Barash, V., Dunne, C., Capone, T., Perer, A. and Gleave, E. (2009). Analyzing(social media) networks with NodeXL. In: Proceedings of the fourth international conference on Communities and technologies, 2009, p.255–264.[Online]. Available at: [Accessed: 30 March 2013].Snow, J. (n.d.). Mode of Communication of Cholera(John Snow, 1855). [Online]. Available at:[Accessed: 1 February 2013].Twitter Help Center. (n.d.). The Twitter glossary. [Online]. Available at:[Accessed: 2 January 2013].