Your SlideShare is downloading.
×

- 1. Interactive visualization and exploration of network data with gephi Bernhard Rieder Universiteit van Amsterdam Mediastudies Department and some conceptual context
- 2. Two kinds of mathematics Can there be data analysis without math? No. Does this imply epistemological commitments? Yes. But there are choices, e.g. between: ☉ Confirmatory data analysis => deductive ☉ Exploratory data analysis (Tukey 1962) => inductive
- 3. Two kinds of mathematics Statistics Observed: objects and properties Inferred: social forces Data representation: the table Visual representation: quantity charts Grouping: "class" (similar properties) Graph-theory Observed: objects and relations Inferred: structure Data representation: the matrix Visual representation: network diagrams Grouping: "clique" (dense relations)
- 4. Graph theory Leonhard Euler, "Seven Bridges of Königsberg", 1735 Introducing the "point and line" model
- 5. Graph theory Develops over the 20th century, in particular the second half. Integrates branches of mathematics (topology, geometry, statistics, etc.). Graph theory is "the mathematics of structure" (Harary 1965), "a mathematical model for any system involving a binary relation" (Harary 1969); it makes relational structure calculable. "Perhaps even more than to the contact between mankind and nature, graph theory owes to the contact of human beings between each other." (König 1936)
- 6. Basic ideas Moreno 1934 Graph theory developed in exchange with sociometry, small-group research and (later) social exchange theory. Starting point: "the sociometric test" (experimental definition of "relation")
- 7. Basic ideas
- 8. Forsythe and Katz, 1946, "adjacency matrix"
- 9. Harary, Graph Theory, 1969
- 10. Basic ideas The network singularity Why do network analysis and visualization? Which arguments are put forward? ☉ New media: technical and conceptual structures modeled as networks ☉ The network imaginary: networks as analytical device and trending topic ☉ Calculative capacities: powerful techniques and tools ☉ Visualization: the network diagram, "visual analytics" ☉ Logistics: data, software, and hardware are available and cheap ☉ Methodology I: dissatisfaction with statistics => SNA ☉ Methodology II: a "new science of networks" (Watts 2005) emerged ☉ Society: diversification, problems with demographics / statistics / theory
- 11. Basic ideas Adamic and Glance, "Divided They Blog", 2005
- 12. Graph theory Graph theory consists of or provides: ☉ A basic conceptual and formal model (point and line) ☉ Descriptive and analytical language to talk about specific graphs ☉ Extensive calculability of structure ☉ Various “native” (and non-native) forms of visualization
- 13. Formalization "As we have seen, the basic terms of digraph theory are point and line. Thus, if an appropriate coordination is made so that each entity of an empirical system is identified with a point and each relationship is identified with a line, then for all true statements about structural properties of the obtained digraph there are corresponding true statements about structural properties of the empirical system." (Harary et al. 1965) There is always an epistemological commitment! => What can "carry" the reductionism and formalization?
- 14. Much of these data can be analyzed as graphs. Social media formalize interaction at the interface.
- 15. Basic ideas What Kind of Phenomena/Data? Interactive networks (Watts 2004): link encodes tangible interaction ☉ social network ☉ citation networks ☉ hypertext networks Symbolic networks (Watts 2004): link is conceptual ☉ co-presence (Tracker Tracker, IMDB, etc.) ☉ co-word ☉ any kind of "structure" that can be formalized as point and line => do all kinds of analysis (SNA, transportation, text mining, etc.) => analyze structural properties in various ways
- 16. Basic ideas File formats To be able to begin, we need data in a graph file format. There are a number of different file formats used to specify graphs. Different formats have different capacities (e.g. .gexf allows to specify time intervals). The guess (.gdf) format: http://courses.polsys.net/gephi/
- 17. Basic ideas What is a graph? An abstract representation of nodes connected by links. Two ways of analyzing graphs: ☉ numerical analysis (graph statistics, structural measures, etc.) ☉ visualization (network diagram, matrix, arc diagram, etc.)
- 18. Basic ideas Wikipedia: Glossary of graph theory Tools are easy, concepts are hard http://courses.polsys.net/gephi/
- 19. Vertices and edges! Nodes and lines! Two main types: Directed (e.g. Twitter) Undirected (e.g. Facebook) Properties of nodes: degree, centrality, etc. Properties of edges: weight, direction, etc. Properties of the graph: averages, diameter, communities, etc. Basic ideas What is a graph? A B C D a-b b-d b-c c-d Nodes, Degree: A: 1, B: 3, C: 2, D: 2 Nodes, Weighted Degree: A: 1, B: 3, C: 3, D: 3 Edges, Weight: a-b: 1, b-c: 1, b-d: 1, c-d: 2 Graph, diameter: 2 Graph, density: 0.667 (4 edges out of 6) Graph, average shortest path: 1.334 Numbers are great for comparison!
- 20. Basic ideas
- 21. Basic ideashttp://courses.polsys.net/gephi/
- 22. Basic ideas Interactive visual analytics Bringing structure to the surface (gephi panel: "layout") ☉ different spatializations (force, geometry, etc.) Projecting variables into the diagram (gephi panel: "ranking") ☉ Size (nodes, edges, labels, etc.) ☉ Color (nodes, edges, labels, etc.) Deriving measures (gephi panel: "statistics") ☉ Properties of nodes, edges, structure => new variables Analysis: e.g. correlation between spatial layout and variables?
- 23. Layout algorithms transform n-dimensional adjacency matrices into two-dimensional diagrams
- 24. Every algorithm/technique reveals the structure of the graph differently, shows different aspects
- 25. Basic ideas Interactive visual analytics Bringing structure to the surface (gephi panel: "layout") ☉ different spatializations (force, geometry, etc.) Projecting variables into the diagram (gephi panel: "ranking") ☉ Size (nodes, edges, labels, etc.) ☉ Color (nodes, edges, labels, etc.) Analysis: e.g. “correlation” between spatial layout and variables?
- 26. Basic ideas
- 27. Nine measures of centrality (Freeman 1979)
- 28. Basic ideas Interactive visual analytics Bringing structure to the surface (gephi panel: "layout") ☉ different spatializations (force, geometry, etc.) Projecting variables into the diagram (gephi panel: "ranking") ☉ Size (nodes, edges, labels, etc.) ☉ Color (nodes, edges, labels, etc.) Deriving measures (gephi panel: "statistics") ☉ Properties of nodes, edges, structure => new variables Analysis: e.g. “correlation” between spatial layout and variables?
- 29. Basic ideas
- 30. Basic ideas
- 31. Label PR α=0.85 PR α=0.7 PR α=0.55 PR α=0.4 In-Degree Out-Degree Degree n34 0.0944 0.0743 0.0584 0.0460 4 1 5 n1 0.0867 0.0617 0.0450 0.0345 1 2 3 n17 0.0668 0.0521 0.0423 0.0355 2 1 3 n39 0.0663 0.0541 0.0453 0.0388 5 1 6 n22 0.0619 0.0506 0.0441 0.0393 5 1 6 n27 0.0591 0.0451 0.0371 0.0318 1 0 1 n38 0.0522 0.0561 0.0542 0.0486 6 0 6 n11 0.0492 0.0372 0.0306 0.0274 3 1 4
- 32. FB group "Islam is dangerous" Friendship network, color: betweenness centrality 2.339 members Average degree of 39.69 81.7% have at least one friend in the group 55.4% five or more 37.2% have 20 or more founder and admin has 609 friends
- 33. Twitter 1% sample, co-hashtag analysis 227,029 unique hashtags, 1627 displayed (freq >= 50) Size: frequency Color: modularity
- 34. Size: frequency Color: user diversity Twitter 1% sample, co-hashtag analysis 227,029 unique hashtags, 1627 displayed (freq >= 50)
- 35. Size: frequency Color: degree Twitter 1% sample, co-hashtag analysis 227,029 unique hashtags, 1627 displayed (freq >= 50)
- 36. Network statistics betweenness centrality degree Relational elements of graphs can be represented as tables (nodes have properties) and analyzed through statistics. Network statistics bridge the gap between individual units and the structural forms they are embedded in. This is currently an extremely prolific field of research.
- 37. Twitter 1% sample Co-hashtag analysis Degree vs. wordFrequency
- 38. Degree vs. userDiversity Twitter 1% sample Co-hashtag analysis
- 39. Basic ideas PlugIn: Spatial Ranking
- 40. Co-like analysis of my personal FB network: Nodes: users / Links: "liking the same thing" Example 3: our imagination
- 41. Basic ideas PlugIn: Multimodal Projection
- 42. Basic ideas
- 43. Basic ideas PlugIn: GeoLayout
- 44. Thank You rieder@uva.nl https://www.digitalmethods.net http://thepoliticsofsystems.net "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (Tukey 1962)