Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

Social Network Analysis and Visualization

1,906 views

Published on

Vermont Code Camp 7 #VTCC7 Slide Deck
Author: Al Ramirez @mirezez

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here

Social Network Analysis and Visualization

1. 1. Social Network Analysis & Visualization #Beginner #Twitter #SNA #DataViz #VTCC7 #Gephi #MongoDB #JavaScript #JVM #Node.js @mirezez Alberto Ramirez r@mirez.us
2. 2. About Presenter Currently Systems Architect, iSystems Previously Happy ITLAPD 2015!
3. 3. Nodes & Edges Un/directed, Un/weighted Degree In-Degree Out-Degree Topology Graph Density Meaures how many edges are in the graph compared to the maximum possible number of edges (complete graph). Diameter Longest path between any two nodes in the network Radius Minimum eccentricity for any given node in the graph Graph Basics
4. 4. Real Networks Properties 1. Growth: Networks are assembled one node at a time and increase in size. 2. Preferential attachment: As new nodes join the network, the probability that it will choose a given node is proportional to the number of nodes that target node already has. “Rich Get Richer”
5. 5. Social Network Examples Undirected, Unweighted Directed, Unweighted
6. 6. Cliques • k-clique, where all nodes are adjacent to each other within the subgraph. • n-clique, where n is a positive integer, is a collection C of vertices in which any two vertices u,v ∈ C have distance ≤n. • p-clique, where p is a real number between 0 and 1, is a collection C of vertices in which any vertex has ≥p|C| neighbors in C. Trouble with Clique Targeting 1. Not resilient networks. 2. Uniformity in the way cliques are defined can lead to little to no insights into that subgraph. 3. The clique might be a narrowing of a larger, more legitimate community to be evaluated. Finding Cliques
7. 7. K-Cores Maximal subgraph with minimum degree at least k. In Graph G as k increases, the subgraph becomes more exclusive Finding K-Cores
8. 8. Node Centrality Identifying important nodes Betweenness Centrality Measures how often a node appears in the shortest path betwe Closeness Centrality Average distance from a given node to all other nodes in the gra
9. 9. Edge-Betweenness (Hierarchical) Clustering Girvan–Newman algorithm O(N^3) 1. Calculate betweenness of all edges in graph. 2. Remove edge with highest betweenness. 3. The betweenness of all edges affected by the removal is rec 4. Rinse and repeat until no edges remain. Expensive, Yet Intuitive Decomposition of Graph
10. 10. Case Study: Finding @VTCodeCamp Twitter Communities Methodology Network Twitter users as nodes, follows as directed edges. 1. Find all followers of @VTCodeCamp, recursively find next level of users. 2. Removing @VTCodeCamp from final datasets. 3. Twitter RESTful Search API v1.1 4. Node.js Client 5. MongoDB 3.0 Aggregation Framework 6. GEXF - Graph Exchange Xml Format 7. Gephi & Gephi Toolkit (JVM) - Analysis & Viz
11. 11. Twitter API Node.js Client MongoDB .gexf Format Gephi / Toolkit DataFlow
12. 12. GET followers/idshttps://api.twitter.com/1.1/followers/ids.json?cursor=-1&screen_name=vtcodecamp&count=5000 15 Resource Requests per 15 Minute Window, 5000 max per response (cursor) Solution Throttling Twitter Search API GET users/lookup 180 Resource Requests per 15 Minute Window Throttling
13. 13. db.usergraph.aggregate([ { \$project : {_id:0, user_id : "\$user_id_str", twitterFollowers : "\$followers.ids_str" } }, { \$unwind : "\$twitterFollowers" }]); KV Pairs of User/Follower Aggregation Framework Rank Followers of @VTCodeCamp by their In-Degree Aggregate k-cliques
14. 14. GEXF Format
15. 15. Graph Stats vtcodecamp 557 users 11,058 follows 364,951 users 1,045,606 follows
16. 16. Gephi Toolkit Demo
17. 17. Gephi Application Demo
18. 18. Iteration #1 MCL Clustering Betweenness Node Size
19. 19. Iteration #1 MCL Clustering Betweenness Node Size Diameter = 7
20. 20. Software/Data Resources ◦ Applications Gephi Tulip Pajek (Windows) ◦ Packages NetworkX (Python) igraph (R or Python) Statnet (R) Sigma.js Gephi Toolkit (JVM) ◦ Data Sets Gephi Datasets https://github.com/gephi/gephi/wiki/Datasets UCIrvine - http://networkdata.ics.uci.edu/index.php
21. 21. Presentation Resources GitHub https://github.com/mirez/snadataviz-vtcc7.git