Networkx & Gephi Tutorial #Pydata NYC


Published on

Slide deck from my presentation at NYC's #Pydata 2012 conference -

Talk abstract:
Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Homophily
  • Endogenous Trend – information spread
  • Exogenous information spread
  • Hashtags have emerged as a way for people to gather around topics or events.
  • - Mitt romney: #gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy- Newt Gingrich: #palestine, #OWS, #immigration, #abortion (he famously said – “Stop whining, take a bath and get a job!”Equal: #republican, #dems, #economics, #amnestyCo-occurence
  • Networkx supports
  • Zachary's Karate Club Graph describes the friendships between the members of a US karate club in the 1970s. The significant feature of this social network is that the club president and the instructor were involved in a dispute (some might say: a fight) over the issue of how much to charge for lessons. This split the club into two factions, one centred around the president, and the other centred around the instructor.
  • Betweenness – number of shortest paths from all vertices that pass through that node / positioningCloseness – how fast it will take to spread information from s to all other nodes sequentially / distance of s from all other actors in a networkEigenvector – measure of the influence of a node (page rank, connections to high scoring nodes contribute more to the score)Clustering Coefficient – measure of degree to which nodes in a graph tend to cluster together (how close to being a clique = 1)
  • NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.NetworkX was born in May 2002. The original version was designed and written by AricHagberg, Dan Schult, and Pieter Swart in 2002 and 2003. The first public release was in April 2005.
  • Python – user description2 days of Twitter data-
  • Networkx & Gephi Tutorial #Pydata NYC

    1. 1. Networkx & Gephi Tutorial #pydata Gilad Lotan | @gilgul
    2. 2. link
    3. 3. #gayrights, #lgbt, #jesus, #palestine, #OWS, #immigration,#flipflop, #jobs, #economy #abortion #republican, #dems, #economics, #amnesty
    4. 4. #Debates / Ohio
    5. 5. #Debates / OhioPoliticos Ohio based Media OSU Students
    6. 6. • Node network properties – from immediate connections indegree=3 • indegree how many directed edges (arcs) are incident on a node outdegree=2 • outdegree how many directed edges (arcs) originate at a node degree=5 • degree (in or out) number of edges incident on a node – from the entire graph • centrality (betweenness, closeness) Source: Lada Adamic (SI508-F08)
    7. 7. Example Graph Types• Complete Graph• Bipartite Graph – Vertices can be divided into two disjoint sets – Ex: students & schools
    8. 8. Social Network Attributes• Scale Free – Degree distribution follows a power law – Barabasi et al (‘99): mapped the topology of a portion of the web• Small World – Most nodes are not neighbors, but can be reached by small number of hops – Watts & Strogatz (’98) – Properties: cliques, sub networks with high clustering coefficient, most pairs of nodes connected by at least one short path
    9. 9. (Zachary) Karate club graph social network of friendships between 34 members of a karate club at a US university in the 1970s. Standard test network for clustering algorithms -> during the observation period the club broke up into two separate clubs over a conflict.
    10. 10. Graph Measures• Centrality – Betweenness – Closeness – Eigenvector – Degree• Clustering Coefficient (clique)• Modularity
    11. 11. Graph Layout• Open Ord – Better distinguishes clusters• Yifan Hu• Force Atlas• Fruchterman Reingold – Graph as a system of mass particles (nodes:particles, edges:springs)
    12. 12. Networkx
    13. 13. Graph Generators
    14. 14. Generate Twitter Graph
    15. 15. graphml file nodes edges
    16. 16. Twitter Users with Python in their Bios• 2 days of Twitter data (Oct 24th and 25th)• Total: 4246 users (62k tweets)• @mikanyan1 tweeted 795 times
    17. 17. Pythonistas on Twitter
    18. 18. Pythonistas on Twitter Spanish Speakers English / European ChinesePython(the snake) Japanese Musicians, Artists
    19. 19. Twitter User Community: Data Science• Grepped from Twitter bios over 1 week:"data science|data scientist|machine learning|data strateg”• 1053 Users• 14k Tweets• Most tweeting users: – @data_nerd (659) – @Chantel_Esworth (562) – @Da5_12 (253)
    20. 20. Dataists on Twitter
    21. 21. Thank You Gilad Lotan Twitter: @gilgulGithub: giladlotan