Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- A Social Network Based Model for e-... by Juan David Cruz-G... 477 views
- The Three Most Important Positions ... by Maya Townsend 1141 views
- SCARF Model for Managing Organizati... by Maya Townsend 5802 views
- Social networks by Lee Schlenker 876 views
- Six Ways to Influence Change by Maya Townsend 726 views
- A comparative study of social netwo... by David Combe 28488 views

1,906 views

Published on

Vermont Code Camp 7 #VTCC7 Slide Deck

Author: Al Ramirez @mirezez

Published in:
Data & Analytics

No Downloads

Total views

1,906

On SlideShare

0

From Embeds

0

Number of Embeds

24

Shares

0

Downloads

27

Comments

6

Likes

2

No notes for slide

- 1. Social Network Analysis & Visualization #Beginner #Twitter #SNA #DataViz #VTCC7 #Gephi #MongoDB #JavaScript #JVM #Node.js @mirezez Alberto Ramirez r@mirez.us
- 2. About Presenter Currently Systems Architect, iSystems Previously Happy ITLAPD 2015!
- 3. Nodes & Edges Un/directed, Un/weighted Degree In-Degree Out-Degree Topology Graph Density Meaures how many edges are in the graph compared to the maximum possible number of edges (complete graph). Diameter Longest path between any two nodes in the network Radius Minimum eccentricity for any given node in the graph Graph Basics
- 4. Real Networks Properties 1. Growth: Networks are assembled one node at a time and increase in size. 2. Preferential attachment: As new nodes join the network, the probability that it will choose a given node is proportional to the number of nodes that target node already has. “Rich Get Richer”
- 5. Social Network Examples Undirected, Unweighted Directed, Unweighted
- 6. Cliques • k-clique, where all nodes are adjacent to each other within the subgraph. • n-clique, where n is a positive integer, is a collection C of vertices in which any two vertices u,v ∈ C have distance ≤n. • p-clique, where p is a real number between 0 and 1, is a collection C of vertices in which any vertex has ≥p|C| neighbors in C. Trouble with Clique Targeting 1. Not resilient networks. 2. Uniformity in the way cliques are defined can lead to little to no insights into that subgraph. 3. The clique might be a narrowing of a larger, more legitimate community to be evaluated. Finding Cliques
- 7. K-Cores Maximal subgraph with minimum degree at least k. In Graph G as k increases, the subgraph becomes more exclusive Finding K-Cores
- 8. Node Centrality Identifying important nodes Betweenness Centrality Measures how often a node appears in the shortest path betwe Closeness Centrality Average distance from a given node to all other nodes in the gra
- 9. Edge-Betweenness (Hierarchical) Clustering Girvan–Newman algorithm O(N^3) 1. Calculate betweenness of all edges in graph. 2. Remove edge with highest betweenness. 3. The betweenness of all edges affected by the removal is rec 4. Rinse and repeat until no edges remain. Expensive, Yet Intuitive Decomposition of Graph
- 10. Case Study: Finding @VTCodeCamp Twitter Communities Methodology Network Twitter users as nodes, follows as directed edges. 1. Find all followers of @VTCodeCamp, recursively find next level of users. 2. Removing @VTCodeCamp from final datasets. 3. Twitter RESTful Search API v1.1 4. Node.js Client 5. MongoDB 3.0 Aggregation Framework 6. GEXF - Graph Exchange Xml Format 7. Gephi & Gephi Toolkit (JVM) - Analysis & Viz
- 11. Twitter API Node.js Client MongoDB .gexf Format Gephi / Toolkit DataFlow
- 12. GET followers/idshttps://api.twitter.com/1.1/followers/ids.json?cursor=-1&screen_name=vtcodecamp&count=5000 15 Resource Requests per 15 Minute Window, 5000 max per response (cursor) Solution Throttling Twitter Search API GET users/lookup 180 Resource Requests per 15 Minute Window Throttling
- 13. db.usergraph.aggregate([ { $project : {_id:0, user_id : "$user_id_str", twitterFollowers : "$followers.ids_str" } }, { $unwind : "$twitterFollowers" }]); KV Pairs of User/Follower Aggregation Framework Rank Followers of @VTCodeCamp by their In-Degree Aggregate k-cliques
- 14. GEXF Format
- 15. Graph Stats vtcodecamp 557 users 11,058 follows 364,951 users 1,045,606 follows
- 16. Gephi Toolkit Demo
- 17. Gephi Application Demo
- 18. Iteration #1 MCL Clustering Betweenness Node Size
- 19. Iteration #1 MCL Clustering Betweenness Node Size Diameter = 7
- 20. Software/Data Resources ◦ Applications Gephi Tulip Pajek (Windows) ◦ Packages NetworkX (Python) igraph (R or Python) Statnet (R) Sigma.js Gephi Toolkit (JVM) ◦ Data Sets Gephi Datasets https://github.com/gephi/gephi/wiki/Datasets UCIrvine - http://networkdata.ics.uci.edu/index.php
- 21. Presentation Resources GitHub https://github.com/mirez/snadataviz-vtcc7.git

No public clipboards found for this slide

Login to see the comments