Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big social data analytics - social network analysis

1,485 views

Published on

Social Network Analysis - Inforte course on Big Social Data Analytics, Tampere, 2017

Published in: Business
  • High paying Twitter jobs? $25 per hour, start immediately ●●● http://t.cn/AieXiXbg
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big social data analytics - social network analysis

  1. 1. Social Network Analysis Inforte course on Big Social Data Analytics 2017 Dr. Jari Jussila Twitter: @jjussila Email: jari.j.jussila@tut.fi GitHub: https://github.com/jjussila/BigSocialDataAnalytics
  2. 2. WEB MOBILE AND SOCIAL MEDIA ERP CRM Purchase & Transaction Records Offers and Quotations Customer Engagements A/B Testing Dynamic Pricing Search Engine Marketing and Optimization Target Marketing Images and Videos Speech to Text Sensor Data Application Log Data SMS/MMS Location Data Social Network Analysis From transactions to interactions Social Media Posts Customer Segmenting
  3. 3. Network Analysis (NA) & Social Network Analysis (SNA)
  4. 4. Graph and Matrix Representation of Networks Star Circle Chain 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 Matrix
  5. 5. Directed and Undirected Networks B A C A B C A 0 0 1 B 1 0 0 C 0 1 0 B A C A B C A 0 1 1 B 1 0 1 C 1 1 0
  6. 6. Sociomatrix Jim Bob Alex Tom Jim - 0 1 0 Bob 1 - 1 1 Alex 1 1 - 1 Tom 0 1 1 - Relationship: is friend of Source: Hoffman 2000; Moreno 1953 “the mathematical study of psychological properties of populations, the experimental technique of and the results obtained by application of quantitative methods” (Moreno, 1953, pp. 15-16).
  7. 7. Direct and Indirect Paths (Friends/Connections/etc.)
  8. 8. Nodes and Edges GephiNodeXL
  9. 9. Anatomy of Networks
  10. 10. Network Metrics: Prominence Centrality Prestige Prominence Degree Centrality Closeness Centrality Degree Prestige Proximity Prestige Betweeness Centrality Information Centrality Status or Rank Prestige Source: Wasserman & Faust 1994
  11. 11. • Degree • How many direct links a node has to other nodes • In the case of a directed network it is possible to calculate both indegree (incoming connections) and outdegree (outgoing connections) 11 Degree Centrality Source: Wasserman & Faust 1994
  12. 12. • Closeness is the sum of shortest paths of a node to other nodes in the network • dij length of shortest path between i and j • Closeness centrality indicates how quickly a node can interact with other nodes å= = n ij iji dc Closeness Centrality Source: Wasserman & Faust 1994
  13. 13. • Betweennes measures the degree to which a node is located at the shortest paths between two nodes • Betweennes centrality indicates the ability of node to control information between other nodes (gatekeeper) • A node may not be locally central, but may still have a high betweenness centrality 13 Betweenness Centrality Source: Wasserman & Faust 1994
  14. 14. Network Analysis Process in Practice • Network Analysis process usually consists of the following four phases: 1. Interpreting the phenomena under investigation as a network 2. Collecting data 3. Cleaning and refining the data 4. Network layout and fine-tuning Source: Huhtamäki & Parviainen 2015
  15. 15. A process for visualization Source: Card et al. 1999
  16. 16. Visualization Stages Visual and Cognitive Processing Physical Environment Social Environment Data gathering Data Preprocessing and transformation Visualization Tool Data manipulation Data exploration Source: Ware 2004
  17. 17. OSTINATO Process Model for Visual Network Analysis Source: Huhtamäki 2016
  18. 18. Entity Recognition? • Twitter provides natural identifiers for nodes (however some nodes maybe fake accounts or bots) • In some other application areas, such as, bibliographic data analysis entity recognition is more problematic • Entity Recognition can be done in network visualization tools (e.g. Gephi Data Laboratory) or using third-party applications (e.g. Open Refine)
  19. 19. Entity Recognition in Gephi Data Laboratory 22.5.2017 19 Source Target
  20. 20. Node and Edge Creation DiGraph – Directed graphs with self loops Each user mention creates an edge between users. For Twitter Mentions see: https://support.twitter.com/articles/14023#
  21. 21. Visual Properties Configuration Node Partition by Modularity Class
  22. 22. Layout Processing: Force-driven layout • Layout refers to the act of placing the nodes on canvas • Force-driven layout is a straightforward option: – Nodes repel each other – Connections act as springs pulling the nodes back together – The center of a gravitational field is placed in the middle of the canvas – The process is run and configured in iteration until the visualizer is happy with the result Source: Huhtamäki 2015
  23. 23. Example Source: Huhtamäki et al. 2012 The list of startups participating in the Tekes YIC program was scraped from Tekes homepage. The IEN Dataset was used to gather data on companies, investors, key individuals, and acquisitions. Moreover, the Twitter usernames of the YIC companies were compiled in a spreadsheet in a semi-manual manner, and a tailored script was implemented to crawl Twitter REST API to collect the list of followers of each YIC company with a Twitter account.
  24. 24. Interactive Network Visualization Source: Aramo-Immonen et al. 2016; Aramo-Immonen et al. 2015 http://www.tut.fi/novi/case/2015-cbh-cmadfi2014-informallearning/twomode/network/
  25. 25. Hashtag Co-Occurrence Matrix http://www.tut.fi/novi/case/2015-cbh-cmadfi2014-informallearning/hashtags/matrix/ Source: Aramo-Immonen et al. 2016; Aramo-Immonen et al. 2015
  26. 26. Extraction of Twitter data and Network Visualization with Gephi
  27. 27. Steps • Collect the Twitter data – Download the following script for extracting tweets: https://github.com/jjussila/BigSocialDataAnalytics/blob/master/sc ripts/search_trump.py – Create a Twitter account or borrow from friend, if you do not already have one – Create a Twitter App https://apps.twitter.com/ – Create keychain.json file (that includes necessary keys and tokes for accessing the data) • Start running Python code online – https://www.pythonanywhere.com/ • Install the following software – Gephi https://gephi.org/ (for network visualization)
  28. 28. Original Twitter-api script Source: https://github.com/jukkahuhtamaki/pcm-demo/tree/master/twitter-api
  29. 29. Modified script of extracting Twitter đata Source: https://github.com/jjussila/BigSocialDataAnalytics
  30. 30. Become a Twitter Developer
  31. 31. Create your first Twitter App
  32. 32. Get the keys and tokens needed to access Twitter data
  33. 33. Create keychain.json using template file Copy-paste from Twitter App the necessary keys and tokens and save the file as keychain.json
  34. 34. Example of extracting tweet data
  35. 35. Modifying the script Note: %40 = ‘@’ %23 = ‘#’’ For more details see: w3schools.com ASCII Encoding Reference
  36. 36. Network creation with NetworkX library Source: NetworkX
  37. 37. Using PythonAnywhere Upload the following files: - search_twitter.py - keychain.json
  38. 38. Running Python code on PythonAnywhere Start a new console: Bash
  39. 39. Execute Python script in Bash console 22.5.2017 39
  40. 40. Using PythonAnywhere Download the following files: - network.gexf
  41. 41. Open gexf (Graph Exchange XML Format) with Gephi
  42. 42. Calculate the Network Metrics and Visualize the Network Modularity Report (Community Detection Algorithm)
  43. 43. References • Aramo-Immonen, H., Kärkkäinen, H., Jussila, J. J., Joel-Edgar, S., & Huhtamäki, J. (2016). Visualizing informal learning behavior from conference participants' Twitter data with the Ostinato Model. Computers in Human Behavior, 55, 584-595. • Aramo-Immonen, H., Jussila, J., & Huhtamäki, J. (2015). Exploring co-learning behavior of conference participants with visual network analysis of Twitter data. Computers in Human Behavior, 51, 1154-1162. • Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. ICWSM, 8, 361-362. • Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization: using vision to think. Morgan Kaufmann. • Huhtamäki, J. (2016). Ostinato Process Model for Visual Network Analytics: Experiments in Innovation Ecosystems. (Tampere University of Technology. Publication; Vol. 1425). Tampere University of Technology. • Huhtamäki, J., Still, K., Isomursu, M., Russell, M., & Rubens, N. (2012, September). Networks of Growth: The Case of Young Innovative Companies in Finland. In Proceedings of the 7th European Conference on Innovation and Entrepreneurship: ECIE (p. 307). Academic Conferences Limited. • Huhtamäki, J., & Parviainen, O. (2013). Verkostoanalyysi sosiaalisen median tutkimuksessa. Otteita verkosta-Verkon ja sosiaalisen median tutkimusmenetelmät. Vastapaino, Tampere. • Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PloS one, 9(6), e98679. • McSweeney, P. J. (2009). Gephi Network Statistics. Presentado en Google Summer of Code. Recuperado a partir de http://gephi. org/google-soc/gephi-netalgo. pdf. • Ware, C. (2013). Information visualization: perception for design (Third ed.): Elsevier. • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge university press.

×