Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making
Upcoming SlideShare
Loading in...5

Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making



This presentation provides an overview of some of the data extractions that may be achieved on social media platforms using their respective APIs and a free open-source tool (NodeXL).

This presentation provides an overview of some of the data extractions that may be achieved on social media platforms using their respective APIs and a free open-source tool (NodeXL).



Total Views
Views on SlideShare
Embed Views



1 Embed 3

http://www.slideee.com 3


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making Presentation Transcript

  • EXTRACTING SOCIAL NETWORK DATA AND MULTIMEDIA COMMUNICATIONS FROM SOCIAL MEDIA PLATFORMS FOR Shalin Hai-Jew 2014 Big XII Teaching and Learning Conference Oklahoma State University Stillwater, Oklahoma Aug. 4 – 5, 2014
  • PRESENTATION OVERVIEW Electronic Commons Academic Environment Analysis and Decision-making (from E-SNA) Examples of Social Network Data Graphs Electronic Social Network Analysis (E-SNA) / Social Physics Social Media Platform Types Microblogging: Twitter Content-Based Social Platforms: YouTube, Flickr Web Networks NodeXL (Network Overview, Discovery and Exploration for Excel) Review Tools 2
  • WELCOMES AND SELF-INTROS Please introduce yourself as your digital alter-ego. What does your electronic alter-ego look like on, say, Twitter? Facebook? Flickr? YouTube? How accurate is your digital doppelganger to your real- world self? Why? If analyst(s) were to conduct an “inference attack” on your electronic presence, what could they find out? What could they infer in terms of data leakage and unintended communications (latent information)? If electronic presence is a kind of social performance, how is it best performed, and why? What are your experiences with social media platforms? Which do you prefer, and why? Have your preferences changed over time? What would you like to learn about electronic social network analysis? 4
  • THE CONTEXT To provide a rationale for the use of electronic social network analysis to benefit the (teaching and learning, and other) work of universities 5 Note: This presentation was designed to introduce some basic electronic social network analysis capabilities, not teach the audience directly how to do the work, which is beyond the purview of the presentation.
  • THE ELECTRONIC COMMONS A “chokepoint” for social issues as a commons A way to reach many technologically and socially A way to trigger mass actions (attitudes, beliefs, actions), potentially in a viral or cascading way…as an influence agent A fantasy space where “egos” may assume audiences (that may be non-existent) A fantasy space where “egos” may assume non-audiences (the assumption of narrow-casting) when it may be broadcasting (unintended audiences along with the intended ones) Re-creation of social power structures from the real-world into the virtual In-group and out-groups Social performances, posing Social codes and meanings Mixed interests and motives Low cost of indulging curiosities, particularly in an automated and 6
  • THE ELECTRONIC COMMONS (CONT.) Certain individuals (demographics) in certain social media platforms Limited big data sharing (value to the data and the identities) Application programming interfaces (APIs) to access shadow databases Importance of maintaining trust with clients Private accounts (vs. public ones) 7
  • AFFORDANCES AND ENABLEMENTS FOR INSTITUTIONS OF HIGHER EDUCATIONWhat are ways that universities have benefitted from the Web? Social media? How can universities continue building on these affordances? What innovations can people use to build on these effects? What are some ways that universities can harness electronic social network analysis (e-SNA) for their various professional / formal and professional / informal objectives? 8
  • ACADEMIC ENVIRONMENT ANALYSIS AND DECISION- MAKING (FROM E-SNA) What is the social media presence of the university? Who are its closest partners in terms of exchanging messages or sharing social media contents? What are the contents of the messages? What are the main expressed sentiments? If the university is considering partnering with an organization, what may be learned about this organization based on its social media presence? Who are the most active participants in a #hashtag conversation about some aspect of the university? Who is the “mayor of the hashtag” (per Marc A. Smith’s term)? Why? What conversations are occurring around the events being hosted on or around campus? 9
  • ACADEMIC ENVIRONMENT ANALYSIS AND DECISION- MAKING (FROM E-SNA) (CONT.) If there is a controversial or trending issue, what are the main sentiments being expressed? Who and which ad hoc groups are expressing what sentiments? How may the university take part constructively? If a flash mob action is being planned around campus, how can campus administrators and law enforcement personnel know about what is happening? If there is a university-related issue that may be inspired, organized, and maintained using social media, how can universities harness social media to constructive ends? Is there mis-use of the university name and brand? Are there fraudulently created social media accounts linked to the university? (After de-aliasing, who is actually behind such accounts?) How can social media platform information be used to geolocate events to physical spaces, and aliases to actual people? 10
  • ACADEMIC ENVIRONMENT ANALYSIS AND DECISION- MAKING (FROM E-SNA) (CONT.) What sorts of images and video are being shared (that are associated with the university) on microblogging sites? On content sharing sites? In terms of digital content tagging, what are the most common words linked to the university (or its student groups, colleges, public figures, and other associated groups and individuals)? If there is a desire to change public perceptions, how may social media platforms be used constructively? What are the ethical rules of engagement? How may a university maintain relationships with its various constituencies through social media? Its political partners? Its corporate partners? Its alumni? Its donors? Its current learners? Its current learners’ families? And then, further, how can e-SNA be used to maintain understandings of these interchanges and interrelationships? 11
  • GRAPH 1A: A #HASHTAG CONVERSATION ON TWITTER (FLU) 13 Note: Please click on the various graphs to link to them on the NodeXL Graph Gallery. Datasets may be downloaded there for many of these data extractions. The data structures can be depicted in a variety of ways based on a number of layout algorithms.
  • A NOTE ABOUT WEB NETWORK GRAPHS Third-party VOSON (Virtual Observatory for the Study of Online Networks) tool out of Australia National University (with an add-in to NodeXL) Maltego Tungsten 26
  • (E-) SOCIAL NETWORK ANALYSIS AND SOCIAL PHYSICS To summarize some of the basic concepts of social network analysis as applied to electronic spaces 27
  • 28
  • “SOCIAL PHYSICS” Identifying the latent “laws” of human interactions with each other at macro and micro levels Laws of affiliation and association (over time): homophily, heterophily Laws of attraction and aversion Laws of human patterning socially (and others) Laws of human uses of physical spaces Laws of systemic change Laws of social frictions and large-scale combat 29
  • STATISTICAL MEASURES Global Network Measures Betweenness centrality: Total number of shortest paths or walks for each pair of dyadic notes (info moves between the shortest paths and closest ties), how much of a bridge a node is for network connectivity Closeness centrality: Geodesic path distance between a node and every other node (farness as sum of all distances to all other nodes; closeness as inverse of farness) Node-Level (Local) Measures Degree centrality: In-degree and out-degree (relative popularity) Clustering coefficient: Embeddedness of single nodes in cliques or ego neighborhoods with its alters 30
  • STATISTICAL MEASURES (CONT.) Global Network Measures Eigenvector centrality (diversity): Relative distances between a node and every other node and those connected to higher-value or popular nodes resulting in a higher value (values between 0 and 1) as a measure of relative influence Clustering coefficient: Aggregation of multiple nodes based on similarity (like co-occurrence) or connectivity, and expressed as proximity or closeness visually; may be a measure of transitivity Motif Structures Dyads, triads, and other structured sub-groupings Local and experiential for the nodes in terms of structured connections May (fractals) / may not be reflective of the overall structure Global motif censuses (counts of occurrences of various types of motif structures in a whole network) Structural holes as indicators of potential openings for nodes and links (to build resilience) 31
  • STRUCTURE MINING Structure of social relationships as an indicator of… Type of social organization An embedded power structure An expression of interdependent and intermixed personalities Network diffusion of information, power, and other transmissible phenomena Geodesic structures and distances and paths Static slice-in-time representations but actual dynamical (changing) realities (“A Brief Overview of Social Network Analysis”) 32
  • NODES AND LINKS (IN TERMS OF SOCIAL MEDIA PLATFORMS) Entities Individuals, organizations, governments, non-profits, political groups, and others People, robots, and cyborgs Relationships Follower, following Tweets, re-tweets, replies-to, mentions Comments on videos and response videos Co-occurrence of related tags networks 33
  • ON TWITTER To give a sense of the various network graphs possible from the Twitter microblogging site (with multimedia scraping) 34
  • ABOUT TWITTER 255 million monthly active users 500 million Tweets (140-character microblogging messages) a day Nearly 80% of active users on mobile 77% of accounts outside U.S. Support for over 35 languages Vine (looping video sharing on mobile) with more than 40 million users Verified accounts [Twitter created by a four-man team in 2006 and incorporated in 2007 (About Twitter FactSheet)] 35
  • TYPES OF INFORMATION AVAILABLE #Hashtag conversations (tagged conversations) #Hashtag eventgraphs (event-based) Keyword networks (multi-topic) User networks (ego-based) List networks (topic-based) 36
  • SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM Word disambiguation 1/100 with geolocation data (which is often noisy data) Rate-limiting Goes back a week only (no deep historical searches without paying for a third-party company with access) Enables extractions of Tweet streams as datasets Limits for some languages (requiring URL Decoder / Encoder for readability, such as at the following) 37
  • ON FLICKR To provide a sense of what network data may be extracted from the Yahoo Flickr imagery and video repository 38
  • ABOUT FLICKR Hosts imagery and video Over 90 million registered members 3.5 million new images uploaded daily Hosting over 6 billion images as of 2011 Free accounts offering a terabyte of storage per individual Enables public and private accounts Enables Creative Commons licensure of contents and CC-Search access [Created by Ludicorp in 2004 and sold to Yahoo in 2005] 39
  • TYPES OF INFORMATION AVAILABLE Related Tags Networks on Flickr (Multi-lingual) tags as a form of metadata describing the imagery and videos Related tags (networks of tags that co-occur and may be expressed as clustered text- based graphs) Graphs may be partitioned for more visual clarity Scraped imagery may be embedded in the graphs User Networks / Groups on Flickr Ego neighborhoods of individual or group contributors to Flickr “Alters” (nodes with direct ties) to the user network in Flickr Follower / following Reply-to 40
  • SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM Disambiguation of terms Reliance on informal tagging and folksonomies Dealing with metadata and not the multimedia directly Limits for some languages (requiring URL decoder / encoder for some languages, namely Cyrillic and Arabic) 41
  • ON YOUTUBE To give a sense of the content networks available on Google’s YouTube video collection 42
  • ABOUT YOUTUBE Over a billion unique users each month on YouTube Six billion hours of video watched monthly 100 hours of video uploaded each minute Localized in 61 countries and as many languages 80% of traffic from outside the U.S. (YouTube Statistics) Adobe Flash video format and HTML 5 format [Founded in 2005 by a three-man development team and purchased by Google in 2006] 43
  • TYPES OF INFORMATION AVAILABLE User networks (user accounts and connections with other user accounts) Thumbnail screengrabs possible Video networks (videos about a particular topic) Thumbnail screengrabs possible 44
  • SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM Based on metadata, not the direct videos Would be richer if drawn from the scripts of the video contents 45
  • ON THE WEB To provide a sense of what may be captured in terms of Web networks 46
  • TYPES OF INFORMATION AVAILABLE Ties between websites URLs linked to a geographical location (and vice versa) Technological understructure of websites Relatedness ties between various types of electronic information (and the enablement of transforms or the changing of one type of electronic information to another) Scraping of files (PDF) and imagery (with EXIF data) Re-identification of aliases 47
  • SOME E-SNA CHALLENGES WITH THIS INFORMATION SOURCE High levels of ambiguity Past data leaving trails (even if the information may not be current) Involves the public web only, not the hidden Web Requires a commercial tool for efficiency and coherence 48
  • NODEXL: NETWORK OVERVIEW, DISCOVERY AND EXPLORATION FOR EXCEL To introduce the freeware and open-source tool that is an add-in to Excel 49
  • 50
  • GENERAL SEQUENCE 1. Define a research question (that is answerable with this type of data query). 2. Formulate a strategy to use the tool to extract information from a particular social media platform. 3. Start NodeXL. Ensure that there is Internet connectivity. Set up the data extraction parameters. Run the data extraction. 4. Process the data. Create the graph visualization. 5. Analyze the graph metrics. Analyze the graph visualization. Analyze complementary information from other sources. 6. Use the information to make a decision or create a strategy. 51
  • TOOL CAPABILITIES Data extraction from a range of social media platforms Graph visualization using a dozen different grouping (clustering) visualizations and overall graph visualizations 52
  • LAYOUT ALGORITHMS Fruchterman-Reingold (force- based) Harel-Koren Fast Multiscale Circle (lattice) Spiral Horizontal sine wave / vertical sine wave Grid Polar / polar absolute Sugiyama Random 53
  • LAYOUT OPTIONS Affects layout of the groups or connected components Treemap Packed rectangles Force-directed 54
  • LAYERS OF DEPENDENCIES From near-to-far Local computer and its processing Connectivity speed to the Internet NodeXL Access to the social media platform Whitelisting Rate limiting (and time-of-day for access) Particular search terms “forbidden” Data processing with NodeXL Data visualization (with NodeXL or another tool) Data analysis Re-run? Additional data extractions? 55
  • 56
  • GRAPH METRICS Overall graph metrics Vertex degree / in-degree and out-degree Betweenness and closeness centralities Vertex eigenvector centrality Vertex PageRank Vertex clustering coefficient Vertex reciprocated Edge reciprocation Group metrics Word and word pairs Top items Twitter search network top items 57
  • GROUPS Group by vertex attribute Group by connected component Group by cluster Group by motif 58
  • REVIEW To highlight some of the main ideas 59
  • A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA Surfacing Hidden or Latent Information Who (which nodes) is most active in an event or conversation or other phenomena? What is he/she/they/it asserting (as an influence agent) via text? via imagery? via video? Scalability This scalable approach enables analysis of both small-scale and (relatively) large-scale data, and everything in between. At some point, the human has to come in to analyze what’s found and to advance the work…but computers can do all the heavy lifting. 60
  • A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA (CONT.) Machine-Enhanced Sentiment Analysis Gist of a Tweetstream related to a user account or related accounts, a hashtag conversation, an eventgraph, a photostream, a videostream Embedded meanings and sentiments (the meaning, the direction and the strength of that emotion, the cultural and social-based valence whether positive or negative) Fine-tuning the automated analysis of texts Machine reading of imagery Human-informed processes (at virtually every step) 61
  • OTHER DATA EXTRACTION AND GRAPH VISUALIZATION TOOLS NCapture on Chrome and Internet Explorer (NVivo 10 on Windows) CEMap on AutoMap with ORA NetScenes Maltego Tungsten™ * All the above have other purposes and capabilities beyond the limited use cases shown here. 62
  • REFERENCES Hansen, D.L., Schneiderman, B., & Smith, M.A. (2011). Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Boston: Elsevier. (available digitally on SciDirect) NodeXL on CodePlex (downloadables) 63
  • LIVE DEMO? QUESTIONS? COMMENTS? Audience suggestions for targets? Any questions this presentation? About e-social network analysis? The software tools? The social media platforms? Questions about research you might want to embark on using this methodology and these tools? 64
  • CONCLUSION AND CONTACT Dr. Shalin Hai-Jew Instructional Designer Information Technology Assistance Center (iTAC) Kansas State University 212 Hale Library Manhattan, KS 66506-1200 785-532-5262 (work phone) shalin@k-state.edu 65