Networkx & Gephi Tutorial #Pydata NYC

  • 9,333 views
Uploaded on

Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi …

Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi

Talk abstract:
Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
9,333
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
220
Comments
1
Likes
22

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Homophily
  • Endogenous Trend – information spread
  • Exogenous information spread
  • Hashtags have emerged as a way for people to gather around topics or events.
  • - Mitt romney: #gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy- Newt Gingrich: #palestine, #OWS, #immigration, #abortion (he famously said – “Stop whining, take a bath and get a job!”Equal: #republican, #dems, #economics, #amnestyCo-occurence
  • Networkx supports
  • Zachary's Karate Club Graph describes the friendships between the members of a US karate club in the 1970s. The significant feature of this social network is that the club president and the instructor were involved in a dispute (some might say: a fight) over the issue of how much to charge for lessons. This split the club into two factions, one centred around the president, and the other centred around the instructor.
  • Betweenness – number of shortest paths from all vertices that pass through that node / positioningCloseness – how fast it will take to spread information from s to all other nodes sequentially / distance of s from all other actors in a networkEigenvector – measure of the influence of a node (page rank, connections to high scoring nodes contribute more to the score)Clustering Coefficient – measure of degree to which nodes in a graph tend to cluster together (how close to being a clique = 1)
  • NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.NetworkX was born in May 2002. The original version was designed and written by AricHagberg, Dan Schult, and Pieter Swart in 2002 and 2003. The first public release was in April 2005.
  • Python – user description2 days of Twitter data-

Transcript

  • 1. Networkx & Gephi Tutorial #pydata Gilad Lotan | @gilgul
  • 2. link
  • 3. #gayrights, #lgbt, #jesus, #palestine, #OWS, #immigration,#flipflop, #jobs, #economy #abortion #republican, #dems, #economics, #amnesty
  • 4. #Debates / Ohio
  • 5. #Debates / OhioPoliticos Ohio based Media OSU Students
  • 6. • Node network properties – from immediate connections indegree=3 • indegree how many directed edges (arcs) are incident on a node outdegree=2 • outdegree how many directed edges (arcs) originate at a node degree=5 • degree (in or out) number of edges incident on a node – from the entire graph • centrality (betweenness, closeness) Source: Lada Adamic (SI508-F08)
  • 7. Example Graph Types• Complete Graph• Bipartite Graph – Vertices can be divided into two disjoint sets – Ex: students & schools
  • 8. Social Network Attributes• Scale Free – Degree distribution follows a power law – Barabasi et al (‘99): mapped the topology of a portion of the web• Small World – Most nodes are not neighbors, but can be reached by small number of hops – Watts & Strogatz (’98) – Properties: cliques, sub networks with high clustering coefficient, most pairs of nodes connected by at least one short path
  • 9. (Zachary) Karate club graph social network of friendships between 34 members of a karate club at a US university in the 1970s. Standard test network for clustering algorithms -> during the observation period the club broke up into two separate clubs over a conflict.
  • 10. Graph Measures• Centrality – Betweenness – Closeness – Eigenvector – Degree• Clustering Coefficient (clique)• Modularity
  • 11. Graph Layout• Open Ord – Better distinguishes clusters• Yifan Hu• Force Atlas• Fruchterman Reingold – Graph as a system of mass particles (nodes:particles, edges:springs)
  • 12. Networkx
  • 13. Graph Generators
  • 14. Generate Twitter Graph
  • 15. graphml file nodes edges
  • 16. Twitter Users with Python in their Bios• 2 days of Twitter data (Oct 24th and 25th)• Total: 4246 users (62k tweets)• @mikanyan1 tweeted 795 times
  • 17. Pythonistas on Twitter
  • 18. Pythonistas on Twitter Spanish Speakers English / European ChinesePython(the snake) Japanese Musicians, Artists
  • 19. Twitter User Community: Data Science• Grepped from Twitter bios over 1 week:"data science|data scientist|machine learning|data strateg”• 1053 Users• 14k Tweets• Most tweeting users: – @data_nerd (659) – @Chantel_Esworth (562) – @Da5_12 (253)
  • 20. Dataists on Twitter
  • 21. Thank You Gilad Lotan Twitter: @gilgulGithub: giladlotan