SP1: Exploratory Network Analysis with Gephi
Upcoming SlideShare
Loading in...5

SP1: Exploratory Network Analysis with Gephi



ICWSM 2011 Tutorial...

ICWSM 2011 Tutorial

Sebastien Heymann and Julian Bilcke

Gephi is an interactive visualization and exploration software for all kinds of networks and relational data: online social networks, emails, communication and financial networks, but also semantic networks, inter-organizational networks and more. Designed to make data navigation and manipulation easy, it aims to fulfill the complete chain from data importing to aesthetics refinements and interaction. Users interact with the visualization and manipulate structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypotheses, intuitively discover patterns or errors in large data collections.

In this tutorial we will provide a hands-on demonstration of the essential functionalities of Gephi, based on a real case scenario: the exploration of student networks from the "Facebook100" dataset (Social Structure of Facebook Networks, Amanda L. Traud et al, 2011). The participants will be guided step by step through the complete chain of representation, manipulation, layout, analysis and aesthetics refinements. Particular focus will be put on filters and metrics for the creation of their first visualizations. They will be incited to compare the hypotheses suggested by their own exploration to the results actually published in the academic paper afterwards. They finally will walk away with the practical knowledge enabling them to use Gephi for their own projects. The tutorial is intended for professionals, researchers and graduates who wish to learn how playing during a network exploration can speed up their studies.

Sébastien Heymann is a Ph.D. Candidate in Computer Science at Université Pierre et Marie Curie, France. His research at the ComplexNetworks team focuses on the dynamics of realworld networks. He leads the Gephi project since 2008, and is the administrator of the Gephi Consortium.

Julian Bilcke is a Software Engineer at ISC-PIF (Complex Systems Institute of Paris, France). He is a founder and a developer for the Gephi project since 2008.



Total Views
Views on SlideShare
Embed Views



27 Embeds 2,863

http://blog.visual.ly 1340
http://gephi.org 767
http://www-complexnetworks.lip6.fr 255
http://sebastien.pro 212
http://www.scoop.it 100
http://cloud.feedly.com 90
http://iceserver2.henola.org 20
http://twitter.com 13
http://tweetedtimes.com 12
http://www.twylah.com 9
https://twitter.com 6
http://webcache.googleusercontent.com 5
http://www.newsblur.com 4
http://www.feedspot.com 4
http://newsblur.com 4
http://digg.com 4
https://gephi.org 3
http://www.pearltrees.com 3
http://reader.aol.com 2
http://thinkery.me 2
https://translate.googleusercontent.com 2 1
http://cynin.champagne.ixxo.fr 1
http://www.hanrss.com 1
http://paper.li 1
http://feedly.com 1
http://pinterest.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    SP1: Exploratory Network Analysis with Gephi SP1: Exploratory Network Analysis with Gephi Presentation Transcript

    • ICWSM’11 TutorialExploratory Network Analysis with: Instructors: Sébastien Heymann, Julian Bilcke seb@gephi.org, julian.bilcke@gephi.org July 17, 2011 | 1 PM - 4 PM
    • Exploratory Network Analysis with GephiThis tutorial is an introduction to Gephi, the open source graph networkvisualization and manipulation software.Gephi aims to fulfill the complete chain from data importing to aestheticsrefinements and interaction.Users interact with the visualization and manipulate structures, shapesand colors to reveal hidden properties.The goal is to help data analysts to make hypotheses, intuitively discoverpatterns or errors in large data collections. EAt the end, the participants will walk away with the practical knowledge INenabling them to use Gephi for their own projects. F F L O
    • Exploratory Network Analysis with GephiIt starts with a brief introduction on the network exploration process anda hands-on demonstration of the essential functionalities of Gephi.Participants are guided step by step through the complete chain of rep-resentation, manipulation, layout, analysis and aesthetics refinements.Next, teams work on real datasets.They finally present their preliminary results. The tutorial concludes witha general question and answer session. IN E F F L O
    • RequirementsBring your own laptop with Java and Gephi installed.Gephi should be updated (menu Help > Check for Updates).Bring a mouse with a wheel.Bring a dataset of your own if you want, verify if it loads well in Gephi.[1][1] http://gephi.org/users/supported-graph-formats/
    • Workshop Schedule - Part IExploratory Network Analysis• Exploratory Data Analysis• Exploratory Network Analysis• Looking for Orderness in Data• Examples• GuidelineIntroduction to Gephi• Approach and Community• Networked Data• Quick Start Demo * 30 min break *
    • Workshop Schedule - Part IIHands-On!• Team Work on a Dataset• Presentation of Preliminary ResultsQ&A
    • Exploratory Data Analysis Confirmatory results Exploratory intuition Serendipity surprise “The greatest value of a picture is when it forces us started with to notice what we never expected to see” John Tukey (1962)
    • Exploratory Data Analysis Non-linear processing chain of Ben Fry in Computational Information Design (2004)
    • Dummy Example Observation: visual saliences on specific file sizes External knowledge: these sizes correspond to films New hypothesis on data: films are highly exchanged, so the study might dig in this direction P2P file size distribution (Latapy et al., 2008)
    • Exploratory Network Analysis 2 interact in real time 1 see the network Gephi prototype (2008) 1st graph viz tool: Pajek (1996) group, filter, compute metrics... Vladimir Batagelj, Andrej Mrvar 3 build a visual language size by rank, color by partition, label, curved edges, thickness...
    • Looking for a “Simple Small Truth”?Drew Conway, What Data Visualization Should Do: 1. Make complex things simple 2. Extract small information from large data 3. Present truth, do not deceive http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
    • Looking for Orderness in Data Make varying 3 cursors simultaneously to extract meaningful patternsMICRO level MACRO level at different levels1 dimension N dimensions on multiple dimensionsT+0 T+N at time scale
    • “Zoom” cursor on Quantitative DataMICRO level MACRO level Global - connectivity - density - centralization Local - communities - bridges between communities - local centers vs periphery Individual - centrality - distances - neighborhood - location - local authority vs hub
    • “Crossing” cursor on Qualitative Data1 dimension N dimensionsSocial- who with whom- communities- brokerage- influence and power- homophilySemantic- topics- thematic clustersGeographic- spatial phenomena
    • “Timeline” cursor on Temporal DataT+0 T+NEvolution of social tiesEvolution of communitiesEvolution of topics
    • Mapping an Innovation CenterCollaborations on projects at Images et Réseaux Themes and content Actors Territory Franck Ghitalla & Ecole de Design de Nantes
    • Mapping Scientific Cooperations
    • Network Map: a Series of Choices corpus data graphical operationsalgorithms communication thresholds goals
    • Guideline # nodes 1 - 100 lists + edges in bonus, focus on qualitative data How attributes explain the structure? 100 - 1,000 • easy to read, “obvious” patterns • focus on entities (in context) • metrics are tools to describe the graph (centrality, bridging...) • links help to build and interpret categories of entities challenge: mix attribute crossing and connectivity How the structure explains attributes?1,000 - 50,000 • hard to read, problem of “hidden signals”: track patterns with various layouts and filtering • focus on structures • metrics are tools to build the graph (cosine similarity...) • categories help to understand the structure challenge: pattern recognition > 50,000 require high computational power
    • Gephi now!
    • Gephi in a Nutshell « Like Photoshop™ for graphs. » Helps data analysts to reveal patterns and trends, highlight outliers and tells story with their data.• Network visualization platform• Open source, supported by a community• Built for performance and usability• Extensible by plug-ins• Windows, MacOS X, Linux
    • Gephi Community Nonprofit organization Communities Contributors Mathieu Bastian, Mathieu Jacomy, Eduardo Ramos Ibañez, Sébastien Heymann, Guillaume Ceccarelli, André Panisson, Antonio Patriarca, Cezary Bartosiak, Martin Škurla, Patrick McSweeney, Yi Du, Hélder Suzuki, Daniel Bernardes, Ernesto Aneiro, Keheliya Gallaba, Luiz Ribeiro, Urban Škudnik, Vojtech Bardiovsky, Yudi Xue
    • Community Mission Provide a “sustainable” software Maintain the technical ecosystem Build a business ecosystem Face cutting-edge technological challenges with a long-term vision Distribute the software in Open Source
    • Community Values Open innovation: ideas and features come from the entire community. Decisions are taken with transparency. We consider this technology as a public good, and will keep it in open source.
    • Diversity of Usagesbusiness leisure :-)communication academic art
    • Diversity of Network EncodingV = { a, b, c, d, e } <graph>E = { (a,b), (a,d), (b,c), (e,a), (c,e) } <nodes> <node id=”a” /> <node id=”b” /> Textual <node id=”c” /> <node id=”d” /> <node id=”e” /> </nodes> <edges> <edge source=”a” target=”b” /> <edge source=”a” target=”d” /> a b c d e <edge source=”b” target=”c” /> a - 1 - 1 - <edge source=”e” target=”a” /> <edge source=”c” target=”e” /> b - - 1 - - </edges> c - - - - 1 </graph> d - - - - - e 1 - - - - XML Graphical Tabular and many others...
    • Software I/O } MySQL PostgreSLSQL Server databases user input Neo4j CSV CSV Pajek NET Pajek NET file Guess GDF Guess GDF > GEXF GEXF GraphML GraphML file Graphviz DOT Excel Spreadsheet UCInet DL SVG NetdrawVNA PDF Tulip TLP PNG Excel Spreadsheet graph streaming
    • Choosing a File Format re es e tu lu ut c Va ru s rib ph St lt t ra At au rix G re ef n t at gh al io tu D /M ic es s at ei ru ic e h st ut liz W ut am rc St Li rib rib ua ra ge L yn ge ie XM s t t Ed At At Vi D H EdCSV Table of features supportedDL Ucinet by GephiDOT GraphvizGDFGEXF * spreadsheets can be loadedGML in the Data LaboratoryGraphMLNET PajekTLP TulipVNA NetdrawSpreadsheet*
    • Do you need... Many features GEXF Spreadsheet GraphML Guess GDF GML UCINet DL Netdraw VNA Graphviz DOT Pajek NET File Type CSV XML Tulip TLP Tabular Few features Text
    • Using Gephi E M O D
    • Team work 1 Create a team of 2~3 people. 2 Choose a dataset. 3 Explore it during 1H. 4 Two teams present their preliminary findings.
    • Dataset #1: GitHub Software Repository “GitHub is an application used by nearly a million people to store over two million code repositories, making GitHub the largest code host in the world.”Started in 2008, it provides the features of an online social networkand a software repository to lower the barriers of collaboration andmake the code easier to contribute. https://github.com
    • Dataset #1: GitHub Software RepositoryData extracted by Franck Cuny* at Linkfluence SAS1st release in March 2010 -> this poster2nd release in June 2011 -> your data_____________Network of user profiles__________Nodes: peoples with at least one repository whoare followed by at least two other peopleEdges: A follows B_____________Network of repositories__________Nodes: repositoriesEdges: A shares a developer with B Very few research publications on this OSN! * franck.cuny@linkfluence.net
    • Dataset #1: GitHub Software RepositoryData extracted by a crawl using the GitHub APISeed: 10 well-known contributors in the Perl communityNetworks by country: Japan, France, United StatesNetworks by language: Perl, PHP, Python, RubyNode attributes:• user country• number of followers• main programming languageEdges:• directed• weight = number of projects A has forked from B
    • Dataset #1: GitHub Software Repository Your mission (should you decide to accept it): find research hypotheses based on your exploration Example question: are the Perl communities based on geography?
    • Dataset #2: The Irish Blogosphere“Identifying Representative Textual Sources in Blog Networks”. K. Wade, D.Greene, C. Lee, D. Archambault, P. Cunningham (2011) http://mlg.ucd.ie/blogs_______________Blogroll Network______________Nodes: blogs with more than two blogroll linksEdges: blogroll link (in-link)_______________Post-link Network_____________Nodes: blogs with more than two blogroll linksEdges: hyperlink inside post from a blog to another(post-link)
    • Dataset #2: The Irish BlogosphereData extracted by a crawl at distance 2 from the seed for the in-linksand Google Blog Search for the post-links.Seed: 21 popular blogs, winners of the “2010 Irish Blog Awards”Node attributes:• post count = total number of posts by blog• category = from the irish blog index at www.irishblogdirectory.com, where available• infomap_comm = community to which a node belongs (infomap algo)• gce_comms = overlapping communities (GCE algo)• moses_comms = overlapping communities (MOSES algo)Edges:• directed• weight = number of hyperlinks in the Post-link network crawl at distance 2 from the seed
    • Dataset #2: The Irish Blogosphere Your mission: explore and try to confirm the official results
    • Hands-On!Start:• Load a graph• Apply a layout• Color the nodes by a qualitative variable in Partition Panel• Size the nodes by a quantitative variable in Ranking Panel• Start to explore...compute metrics, filter the networkEnd:• Export maps to PDF in Preview Tab• Save
    • Presentations GitHub Repository Irish Blogosphere
    • Gephi DocumentationWeb Site: http://gephi.orgSupport: http://forum.gephi.orgWiki: http://wiki.gephi.orgSource code: https://launchpad.net/gephiOnline Tutorialshttp://gephi.org/users/quick-start/http://gephi.org/users/tutorial-visualization/http://gephi.org/users/tutorial-layouts/http://wiki.gephi.org/index.php/Import_CSV_Datahttp://wiki.gephi.org/index.php/Import_Dynamic_DataTutorial in Spanishhttps://code.google.com/p/camon/wiki/Taller_GephiSupported Graph Formatshttp://gephi.org/users/supported-graph-formats/
    • Thank You! Caspar David Friedrich - Wanderer Above the Sea of Fog
    • Credits[slide 11] images from Drew Conwayhttp://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/[slide 22 top left] Benoît Vidal at MFG Labs[slide 22 bottom center] Franck Ghitalla at UTC[slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsanghttp://jeunhotsang.com/blog/2010/12/07/prototype/[slide 27] sketches from Ben Fry, Computational Information Design Special Thanks to Franck Ghitalla and Mathieu Jacomy for their insightful discussions.