A comparative study of social network analysis tools
Upcoming SlideShare
Loading in...5
×
 

A comparative study of social network analysis tools

on

  • 18,182 views

 

Statistics

Views

Total Views
18,182
Views on SlideShare
16,657
Embed Views
1,525

Actions

Likes
31
Downloads
466
Comments
2

13 Embeds 1,525

http://www.scoop.it 858
http://ranumao.net 390
http://labh-curien.univ-st-etienne.fr 194
https://twitter.com 33
http://normantrujillo.wordpress.com 21
http://cercledesconnaissances.blogspot.com 15
http://translate.googleusercontent.com 4
http://www.linkedin.com 3
http://www.ranumao.net 3
http://kred.com 1
https://si0.twimg.com 1
https://twimg0-a.akamaihd.net 1
http://cercledesconnaissances.blogspot.fr 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • One should make the difference between descriptive SNA and predictive SNA. It is one thing to understand a social network, it is another thing to predict what will happen next in that social network. I understand you chose to focus on free software but commercial software as our's (Idiro Technologies) is what is required to predict consumer behaviour in large (1 million nodes plus) networks such as telecoms and online gaming.
    Are you sure you want to
    Your message goes here
    Processing…
  • There are companies who do not have social collaboration tools or any SNA tools so is it possible to do SNA without SNA tools.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 24 minutesGood morning,mynameis DC. I come fromLaHC , University of St-Etienne. The aim of mythesis I amhere to presentyou an article titled « A comparative study of social network analysistools » writtenwith Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry.
  • (for the Gartner …)Many experts insist about social network new behavioursappearing in the enterprise and necessity to takethemintoaccount.Gartner Institute predicts #CITATION#.It alsoconsidersthat SNA isgetting mature.So progresscanbe made in business use of SNA for workflow study intended to permit adaptation of the management to the real communication flow in a company;And Identify key actors in networks, ie. for viral marketing aware campains
  • A previoussurvey by Huisman & Van Duijn has been published.Havingpreviousbench, werealizedthatitshouldbeupdated, as networks and needs have changed.Main Changes in networks concerns the sizes (networks are far largerthanbefore) and the type of information (as networks tend to getricher, withcontextual data).
  • There are 4 main domains of functionalities :the Representations of the network as datafilesVisualizationCharacterization by quantitativeindicatorsCommunitydetectioncapabilities
  • There are different sorts of indicators, which help discribe the network or just one actor.First, among global indicatorsyoucanfindsomegeneral values such as the number of nodes of the graph, the number of edges, the diameter (longestshortest path, how far at the maximum are two nodes in the graph). You can compose these values and deduce the density of the graph (whichdepends on the maximum number of links youcan put in the graph ie. the complete graph). An highdensity (high links ratio) defines a dense graph. A graph with a lowdensity value issaid to besparse.The second type of indicatorsis local indicators. You cancalculateitwithonly a small part of the graph information. An exampleis the number of neighboors (called the degree of a node).You canalso use distances between 2 nodes (or evenedges). The shortestpathis one of them.
  • Nowlet’sadress the benchmark itself.
  • The methodology for the selection of the tools takes into account the following points:First, in our vision blocking criteria for selection are that:The tools must provide the basic metricsused in the Social Network Analysisdomain ;and the networks size can reach tens of thousands of nodes, as well as smallerones.The toolsstudied in depthwerechoosenbecausetheyrepresentlegacy, wellestablished software, or new developments base on recent standards such as modularization and ergonomics.The datasetsused for experimentation are the Zachary’skarate-club dataset and the DBLP database.
  • Seven areas of functionalities were evaluated:Input/output formatsCustom attribute handlingBipartite graphs specific functionsLongitudinal analysis (ietimestamped events)IndicatorsVisualizationClustering capabilities
  • Hereis the list of studied software.Several of them are vizualisationoriented: Pajek, Gephi, GraphViz, GUESS, Tulip.Others claim to permit graph manipulation: igraph, NetworkX, and JUNG.Bothapproaches have led for manytools in incorporating SNA metrics.
  • The retained softwareis split in twocategories: Stand-alone softwares and libraries.The Stand-alone software are Pajek and Gephi.The libraries are igraph and NetworkX.Some of the software are emblematic of an approach (Gephi has somecommonalitieswithTulip for example).Nowlet’sdetailtheseones.
  • This table summarizes how well each tool performs in the seven areas.Pajek performs quite well at the seven domains.If you need more than a simple visualization of a small network, you should avoid NetworkX.Igraph is very good at providing indicator, visualization and community detection.
  • Hereis an otherrepresentation of the results of the benchmark.Weseeclearlywiththis radar viewthatNetworkXisvery good at custom attributeshandling and providesmanyindicators.No software performsfairlywellattemporality and bipartite graphs management.
  • Inorder to concludethispresentation, wecansaythere aredifferentscientificdomainsconcerned by SN, and thisexplainwhythere are somuch approches and toolsavailable.The advancesshould come in short or midtermfrom smart modularisation of user interface, algorithms and data, with the goal of allowing the reuse of methodsbetween software. Contextualanalysisneedtools for longitudinal analysis and attributesstudy.Hierarchical graph visualizationisnow an important subjecttoo for smart manipulation of large graphs.
  • Thankyou.

A comparative study of social network analysis tools A comparative study of social network analysis tools Presentation Transcript

  • A comparative study of social network analysistools
    David Combe, Christine Largeron,
    Előd Egyed-Zsigmondand Mathias Géry
    International Workshop on Web Intelligence and Virtual Enterprises 2 (2010)
  • Outline
    Context: social networks and analysissoftwareExpected functionalities of network analysis softwareBenchmarkConclusion
  • Context
    Definition (Wikipedia)
    A social network isa social structure made up of individuals called "nodes," which are tied by one or more specific types of interdependency, such as friendship, commoninterest, etc.
    Sociologic analysis
    Sociological works (Moreno 1934, Milgram 1967, Cartwright and Harary, 1977)
    Web 2.0 : Renewed interest from the Web based social networks websites development.
    Context
  • Context: Social network in business
    For the Gartner Institute:
    “By 2014, social networking services will replace e-mail as the primary vehicle for interpersonal communications for 20 percent of business users.” (Gartner 2008)
    Social network analysisisgetting mature.
    Some applications in business:
    Workflowstudyto adapt management to the real flow in a company;
    Identifykeyactors, ie. for viral marketing.
    These applications needadapted software.
    Context
  • Context: social networks and analysis software
    Network analysis software
    A previousstatisticalanalysisorientedsurvey (Huisman& Van Duijn, 2003)
    Networks and needs are changing
    Size
    Complex graphs
    Necessity to make a new benchmark
    Context
  • ContextExpected functionalities of network analysis softwareBenchmarkConclusion
  • Expected functionalities of network analysis software
    Representation
    Visualization
    Characterization by indicators
    Communitydetection
    Expected functionalities of network analysis software
  • 1. Network representation as graph(Cartwright and Harary, 1977)
    Link orientation
    Undirected links (edges, ex: co-authorship)
    Directed (arcs, ex: e-mails sent, Enron dataset)
    Weighton edges
    Withtypednodes
    (ex. bipartite network)
    Expected functionalities of network analysis software
    2
    1
    3
    3
  • 1. Network representation as graph
    Expected functionalities of network analysis software
    2
    1
    *Vertices 5
    *Edges
    1 2
    1 4
    2 3
    2 4
    3 4
    3 5
    4 5
    4
    3
    Connections
    5
    Adjacency matrix
    (.net file format)
    Edgelist
    Adjacencylist
  • Aim: give a visualrepresentation of the graph, withdifferentapproaches:
    Fish eye
    Centered on an actor
    Force drivenvisualizationlayouts
    FruchtermanReingold (1984)
    Iterative algorithm
    Randomlayout
    F-R convergence
    2. Visualization
    Expected functionalities of network analysis software
  • 3. Characterization by indicators
    Global indicatorsat network level by:
    Number of nodes
    Number of edges
    Diameter

    Local indicatorsatnodelevel:
    Number of neighboors degree

    Distance
    Length of the shortestpath
    Expected functionalities of network analysis software
    2
    1
    Density
    2
    4
    3
    5
    5
    4
  • 3. Characterization by indicators : how to decide if an actoris « central »?
    Many ways to determine central actors.
    Ex: Betweenness centrality
    Which node is the most likely to be an intermediary for a random communication?
    higher betweennesscentrality
    Selection depends on what they are needed for.
    Expected functionalities of network analysis software
  • 4. Communitydetection
    Community:
    A set of actorshavingstrong connexions.
    Communitydetectionalgorithms
    Newman–Girvan (Newman and Girvan, 2002)
    Walktrap (Latapy & Pons, 2005)
    Expected functionalities of network analysis software
  • ContextExpected functionalities of network analysis softwareBenchmarkConclusion
  • Benchmark methodology
    Required points:
    Asocial network analysis point of view
    Scalability
    Free for educational purposes
    A balance between well established software and newer ones, based on recent development standards (ergonomics, modularity and data portability).
    Datasets: Zachary’skarate-club, DBLP
    Benchmark
  • Software comparisoncriteria
    Input/output formats
    Custom attribute handling
    Bipartite graphs specific functions
    Longitudinal analysis
    Visualization
    Indicators
    Communitydetection
    Benchmark
  • Studied software
    Gephiis an “interactive visualization and exploration platform”.
    • GUESS is dedicated to visualization purposes, with several layouts.
    Tulip can handle over 1 million vertices and 4 millions edges. It has visualization, clustering and extension by plug-ins capabilities.
    GraphVizis mainly for graph visualization.
    UCInetis not free. It uses Pajek and Netdraw for visualization. It is specialized in statistical and matricial analysis. It calculates indicators (such as triad census, Freeman betweenness) and performs hierarchical clustering.
    Pajekis a Windows program for analysis and visualization of large networks. It is freely available, for noncommercial use.
    igraphis a free software package for creating and manipulating graphs. It also implements algorithms for some recent network analysis methods.
    NetworkXis a package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
    JUNG, for Java Universal Network/Graph Framework, is mainly developed for creating interactive graphs in Java GUIs, JUNG has been extended with some SNA metrics.
    Benchmark
  • Selected software
    Stand-alonesoftware
    Pajekhttp://pajek.imfm.si/doku.php
    Gephihttp://gephi.org/
    Libraries
    igraphhttp://igraph.sourceforge.net/
    NetworkXhttp://networkx.lanl.gov/
    Benchmark
  • Pajek(Vladimir Batagelj and AndrejMrvar)
    Developmentstarted in 1996
    Data miningoriented
    Many graph operatorsavailable
    Fast
    Exports 3D visualization
    Macro
    Supports matrices, adjacencylists and arcs listsoriented input files
    Benchmark
  • Gephi(Bastian M., Heymann S., Jacomy M.)
    Benchmark
    Development started in 2008
    Interactive GUI
    Uses Java
    Recent scriptability improvements
    « Photoshop for graphs » with customizable visualization
    Supports the main file formats for networks
    Improvable by plugins
    Community detection still experimental
  • NetworkX(Brandes U., Erlebach T.)
    Python
    Bipartite graphs ready
    Attribute-friendly
    1,000,000 nodeswide networks canbehandled.
    Lacks in communitydetectionalgorithms
    Relies on other software for visualization
    Benchmark
    >>> importnetworkxasnx
    >>> G=nx.Graph()
    >>> G.add_node("spam")
    >>> G.add_edge(1,2)
    >>> print(G.nodes())
    [1, 2, 'spam']
    >>> print(G.edges())
    [(1, 2)]
    >>> G.degree(1)
    1
  • Igraph(Csárdi G., Nepusz T.)
    For R (a statisticalenvironment) and Python. The lowlevel routines are written in C.
    GUI available for R.
    Communitydetectionready.
    Not custom attributes-friendly
    Benchmark
    > g <- graph.ring(10)
    > degree(g)
    [1] 2 2 2 2 2 2 2 2 2 2
    > g2 <- erdos.renyi.game(1000, 10/1000)
    > degree.distribution(g2)
    [1] 0.000 0.000 0.002 0.009 0.020 0.039 0.064 0.107 0.111 0.115 0.118…
    [21] 0.003 0.001
  • Benchmark
    How to choose the right tool?
    ++ Mature functionality
    - - Not available or weak
  • Featurecomparison
    Benchmark
    Temporality
    Input / output
    Visualization
    Clustering
    igraph
    Pajek
    Indicators
    Bipartite
    NetworkX
    Gephi
    Attributehandling
  • ContextExpected functionalities of network analysis softwareBenchmarkConclusion
  • Conclusion
    Manydomains, manyapproaches, many software (sociology, computer science, mathematics and physics).
    • Functionalities to develop in the future (e.g. for decisionsupport):
    Temporalityawareness
    Links and nodesattributesanalysis
    Hierarchicalgraphs
    Conclusion
  • Thankyou for your attention.Any questions ?
  • Bibliography
    Gartner http://www.gartner.com/it/page.jsp?id=1293114
    Gartner Hype Cycle for Social Software, 2008
    Fortunato, S. (2009). Community detection in graphs. Physics Reports, 103. Retrieved from http://arxiv.org/abs/0906.0612.Pons, P., & Latapy, M. (2005). Computing communities in large networks using random walks. Computer and Information Sciences-ISCIS 2005. Retrieved from http://www.springerlink.com/index/P312811313637372.pdf.
    Newman, M., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical review E. Retrieved from http://link.aps.org/doi/10.1103/PhysRevE.69.026113.
    Kamada, T., & Kawai, S. (1989). An algorithm for drawing general undirected graphs. Information processing letters, 31(12), 7--15. Retrieved from http://linkinghub.elsevier.com/retrieve/pii/0020019089901026.
  • Bibliography (2)
    Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine* 1. Computer networks and ISDN systems. Retrieved from http://linkinghub.elsevier.com/retrieve/pii/S016975529800110X.
    Fruchterman, T. M., & Reingold, E. M. (1991). Graph Drawing by Force-directed Placement. Huisman, M., & Van Duijn, M. (2003). Software for social network analysis. In Models and methods in social network analysis (p. 270–316).
    Freeman, L. (1979). Centrality in Social Networks Conceptual Clarification. Social Networks.