Social Network Analysis with Sylva
 Social Network Analysis with Sylva

 Juan Luis Suárez & Anabel Quan-Haase
           Western University
Overview of Workshop
• General overview of the social network
  approach
• Key terminology
• Uniqueness of collecting and analyzing
  social network data
• Entering data into Sylva
• Importing/exporting data into Sylva
• Example I:
• Example II:
• Understanding limitations and problems
• Future Work and Gephi.org
What is SNA?
Social network analysis is focused on uncovering
the patterning of people’s interaction.…Network
analysts believe that how an individual lives
depends in large part on how that individual is
tied into the larger web of social connections.
Many believe, moreover, that the success or
failure of societies and organizations often
depends on the patterning of their internal
structure (Freeman, 1998, November 11).
What is Unique about SNA?
Social science research and theory tends to
focus on social actors’:
        •attributes
        •attitudes
        •opinions
        •behavior
                     Focus is on individual level of analysis,
                     less on network-structural level.
a whole is not simply the sum of its parts
Key Terminology
•   1. Social structure
•   2. Social network
•   3. Nodes
•   4. Linkages/relations
•   5. Additional terms of relevance:
    –   Nodes & edges
    –   Directed graphs vs. undirected graphs
    –   Ego
    –   Alter
    –   Homophily
1. Social Structure
• Sociological inquiry consists of understanding
  the constraining influence of social structure
  on social action
• BUT; how do we study social structure?

      Attributes                    Networks
2. Social Network

                                               Social Actors
          Ties




Figure 2: Social Structure as Social Network
3. Nodes
• The actors considered in a social network are
  exclusively social (alternatively referred to as
  agents, nodes, or social entities).
• These include individuals, organizations,
  institutions, nations, or groups (Wasserman &
  Faust, 1994).
Blurred Nodes
• Social actors can therefore be distinguished
  from non-social actors – e.g., neurons
  comprising a neural network.
• On occasion, the distinction between a social
  and a non-social actor is not absolute. For
  example, computer networks represent a
  hybrid type of network.
Node Attributes
• Every single node can have one or more
  attributes.
• These attributes describe the nodes and allow
  researchers to conduct complex queries of the
  database.
• Node attributes can include the time of
  publication of a book, its length, the number
  of authors, etc.
One-mode vs. Two-mode
• Most social network analysis methods allow only one type
  of social actor (for instance, individuals or corporations) in
  their analysis; these are referred to as one-mode networks
  (Wasserman & Faust, 1994).
• However, methods exist which allow two different types of
  social actors in their analysis; these are referred to as two-
  mode networks. For instance, a study may simultaneously
  analyze corporations and their directors.
• Two-mode networks may also include social actors from
  distinct networks, for example, a network comprised of
  adults and a network comprised of children.
• Two-mode networks allow for comparison between
  different types and sets of social actors.
4. Relationships
• Ties are links that connect social actors, and are
  the main focus of social network analysis. Ties
  are seen as “channels for transfer or “flow” of
  resources (either material or nonmaterial)”
  (Wasserman & Faust, 1994, p. 4).
Simple Relationships
• Naturally occurring ties among social actors are inherently
  complex and consist of numerous different interaction
  activities.
• However, unlike ethnographers network analysts do not
  focus on the complexity of interactions among individuals
  (Burt, 1983).
• Instead, social network analysts focus more on the pattern
  of relations amongst individuals and to do so simplify the
  inherent complexity of social relationships by categorizing
  interactions into different broad types. The types can be
  manifold. For example, a pair of social actors may have
  friendship, working, cooperation, or citation ties.
5. Additional Terms
•   Directed graphs vs. undirected graphs
•   Ego
•   Alter
•   Homophily
Types of Network Analysis
• Ego-centered/Socio-centered Social Networks
• Community-centered social networks
Ego-centered/Socio-centered Social Networks
Actor-Level Centrality
• Actor level degree centrality: Degree
  centrality measures the extent to which an
  actor is linked to all of the other actors in the
  network. Three different measures can be
  distinguished: nodal degree, indegree, and
  outdegree.
• Actor level closeness centrality: Closeness
  measures the distance that an actor has to all
  of the other actors in the network.
• Actor level betweenness centrality:
  Betweenness measures the extent to which an
  actor lies between two other actors and thus
  facilitates/controls the flow of information.
Face-to-face (1/week)    CS




                                          9




Community-Centered Social Networks
Network Level Centralization
• Cohesion Distance: measures the degree of separation
  between actors in a network. It indicates how many
  other people are between two actors - that is, actors
  between an actor and the actor this person needs to
  talk to.
• Network Centralization: measures the number of
  actors that are connected to each actor in the network.
  The more connections among actors, the greater the
  network centrality.
• Density: measures the degree of connection that exists
  in a network. The more actors talk to each other, the
  higher the density.
Measures of Centrality and Assumptions
Measure                   Level     Data Type           Symmetry/Asymmetry

Nodal Degree Centrality   Actor     Dichotomized (>5)   Symmetric (Maximum)

                                                        Asymmetric
Indegree Centrality       Actor     Valued


Outdegree Centrality      Actor     Valued              Asymmetric


Closeness Centrality      Actor     Dichotomized (>5)   Symmetric (Maximum)


Betweenness Centrality    Actor     Dichotomized (>5)   Symmetric (Maximum)


Network Cohesion          Network   Valued              Asymmetric


Network Centrality        Network   Dichotomized (>5)   Asymmetric


Network Density           Network   Dichotomized (>5)   Symmetric (Maximum)
Uniqueness of Collecting and Analyzing
        Social Network Data
•   Relational data
•   Boundary specification and sampling
•   Interdependence of data points
•   Query search
•   Complexity of data collection
    – Manually-harvested
    – Data set
    – Behavioral
    – Self-report
Internet Resources of
            Social Network Analysis
• Center for the Study of Group Processes
  http://lime.weeg.uiowa.edu/~grpproc/
• INSNA International Network of Social Network Analysis
  http://www.heinz.cmu.edu/project/INSNA/
• Barry Wellman’s Homepage
  http://www.chass.utoronto.ca/~wellman/index.html
• CulturePlex
• http://cultureplex.ca/
• Gephi.org
• NodeXL
  http://nodexl.codeplex.com/


                                                           25
Limitations of
        Social Network Analysis
• Boundary specification
• Data source
• Definition of social actors
• No distinct method



                                  27
What is Sylva?
•   A database system management system
•   Graph databases
•   NoSQL database
•   Built on top of Neo4J
Whose Needs Does Sylva Serve?
• Sylva requires no programming skills
• On-the-go modification of the schema
• Storing data in a graph form
• Work from the nodes or from the edges
• Collaborative platform
• Easy-to-use interface thanks to forms,
  autocomplete, …
• Multiple visualizations
• Search and Query Engines
The Interface
The Dashboard
Creating a Database (Graph)
Schema vs Data
My First Schema
Creating a Schema on Sylva
              (manually)
• New Type of Node (person)
• (2nd) New Type of Node (work)
• Relation
  – Incoming or outgoing
  – Allowed relationships
• (3rd) New Type of Node (institution)
Properties of Objects
• Data objects have properties
• A property is an attribute that defines certain
  operations than can be performed on the
  object
• We need properties to enter our data
Properties of “Person”
Properties of “Person”
Entering Data (manually)
My First Graph
The Node Level:
Selecting and Expanding
Collaboration in Sylva
Case of Collaboration
Searching
• Returns a list
Importing and Exporting
• Importing a Schema
• Exporting Data to Gephi
Cuba’s Prominence: Modeling The
  Latin American Afro in Topic Maps
• Objectives:
  – locating the various nodes of bibliographic
    production associated with the generation of an
    image of the Latin-American Afro
  – evaluating the causes that make certain nodes,
    i.e., Cuba and various Cuban intellectuals, emerge
    as key nodes in the network of production of Afro-
    Latin American images
Cuba’s Prominence
• Methodology:
  – a combination of traditional close-reading of texts
    (extraction of nodes and relations) with
  – graph analysis of the emerging network with Page
    Rank algorithm
Measurements (Gephi)
•   Closeness centrality: expresses how well connected an individual is to the whole
    network. A high value in this measurement indicates better connectivity and thus
    expresses the importance of the individual with respect to other elements in the
    network.
•   Betweenness centrality: indicates how important the individual is as a connection
    and transference point within the network. A high value indicates that it is a topic
    that is passed through in the communications (relationships) between the other
    topics on the map.
•   Modularity: is a coefficient that enables us to group together those nodes which
    share connections and zones on the network, so that it divides the map into zones
    with high relationships between them.
•   Influence between nodes: is an analysis which we shall carry out in the second
    part of the article. It is based on the Page Ranking algorithm. This is basic
    algorithm on which the Google search engine was originally based for calculating
    the importance of the pages that it comes up with after a search, and which it
    used to order the results. Its basic idea is that a given node within a network
    becomes important based on the importance of the nodes that relate with it or
    that point to it.
Betweennes Centrality
Modularity
Some numerical results
Sustaining a Global Community
• Henrich et al. [1] have proven that the existence of norms that
  sustain fairness in exchanges among strangers are connected with
  the diffusion of institutions such as market integration and the
  participation in world religions.
• Their research confirms the hypothesis that modern world religion
  may have contributed to the sustainability of large- scale societies
  and large-scale interactions and we propose that art is another
  institution that contributes to the arising and sustainability of large-
  scale societies.
• We use the case of the formation of an artistic network of
  paintings, schools, themes, genres, and artists whose development
  goes along with the expansion and colonization of the Hispanic
  Monarchy across America to show that this artistic network has a
  presence in all political territories encompassing most ethnicities
  and religions of indigenous origin.
Methodology
• The data set comprising the paintings from the Baroque period are
  organized and stored in a PostgreSQL web based database.
• The data includes more than 100,000 total topics (11,443 of them
  are artworks). A distinctive feature of the information is that it is
  organized around both text fields and ad-hoc descriptors that follow
  the model of a formal ontology.
• For our study we have decided to model the data in one of the
  possible networks, a network created from common descriptors as
  weighted edges and artworks as nodes.
• Some pruning methods had to be applied in order to overcome
  some of the shortcomings resulting from the millions of edges and
  the too many relational joins. We also split the dataset in 12
  sections, each covering a 25 year-period, from 1550 to 1850 [4].
Methodology
     •         Similarity Measure:
           –               S(Art1,Art2)=#{common descriptors of
                    Art1 and Art2}
                                     Descriptor 2           Descriptor 6
     Descriptor 1


                                      S=2
Descriptor 3

                                                             Descriptor 7

                                       Descriptor 4

         Descriptor 5
Research Questions
• Our research addresses the issue of the
  sustainability of communities through the
  existence of a flow of shared information.
• This question is of the utmost importance to
  understand the formation and dynamics of
  cultural groups and cultural areas.
• As important as the latter is the study of the
  spatial and temporal dimensions of any given
  political and cultural community as this will shed
  light on the cultural processes resulting from
  previous and currents waves of globalization
Baroque Paintings in the Hispanic
         World: A Network.
• The graph shows, for the first two periods of
  our study, the growth of the saints-related
  paintings (red cluster) as compared to the
  decrease of the cluster with virgins (blue).
  Portraits’ size (brown cluster) remains more
  or less the same, but they get more
  connected to saints’.
• FOTO
Clustering & Visualizations: Raw Graphs
1550-1575       1575-1600       1600-1625       1625-1650       1650-1675   v       1675-1700   v
                                                                                                    v
                                                                                                        v




1700-1725   v   1725-1750   v   1750-1775   v   1775-1800   v   1800-1825   v       1825-1850   v




                                                                                v




                                                                http://zoom.it/vJVw#full
Further Work with Sylva
• Visualization of Schema
• Two Visualizations of Data:
  – Node-centered
  – Community centered
• Query System:
  – Pattern-matching
  – Traversals
• Need for multi-disciplinary teams
• Complexity of analysis
Thank you!
“With enough effort and perseverance:
        Anything is possible”

Sylva workshop.gt that camp.2012

  • 1.
    Social Network Analysiswith Sylva Social Network Analysis with Sylva Juan Luis Suárez & Anabel Quan-Haase Western University
  • 2.
    Overview of Workshop •General overview of the social network approach • Key terminology • Uniqueness of collecting and analyzing social network data • Entering data into Sylva • Importing/exporting data into Sylva • Example I: • Example II: • Understanding limitations and problems • Future Work and Gephi.org
  • 3.
    What is SNA? Socialnetwork analysis is focused on uncovering the patterning of people’s interaction.…Network analysts believe that how an individual lives depends in large part on how that individual is tied into the larger web of social connections. Many believe, moreover, that the success or failure of societies and organizations often depends on the patterning of their internal structure (Freeman, 1998, November 11).
  • 5.
    What is Uniqueabout SNA? Social science research and theory tends to focus on social actors’: •attributes •attitudes •opinions •behavior Focus is on individual level of analysis, less on network-structural level.
  • 6.
    a whole isnot simply the sum of its parts
  • 7.
    Key Terminology • 1. Social structure • 2. Social network • 3. Nodes • 4. Linkages/relations • 5. Additional terms of relevance: – Nodes & edges – Directed graphs vs. undirected graphs – Ego – Alter – Homophily
  • 8.
    1. Social Structure •Sociological inquiry consists of understanding the constraining influence of social structure on social action • BUT; how do we study social structure? Attributes Networks
  • 9.
    2. Social Network Social Actors Ties Figure 2: Social Structure as Social Network
  • 10.
    3. Nodes • Theactors considered in a social network are exclusively social (alternatively referred to as agents, nodes, or social entities). • These include individuals, organizations, institutions, nations, or groups (Wasserman & Faust, 1994).
  • 11.
    Blurred Nodes • Socialactors can therefore be distinguished from non-social actors – e.g., neurons comprising a neural network. • On occasion, the distinction between a social and a non-social actor is not absolute. For example, computer networks represent a hybrid type of network.
  • 12.
    Node Attributes • Everysingle node can have one or more attributes. • These attributes describe the nodes and allow researchers to conduct complex queries of the database. • Node attributes can include the time of publication of a book, its length, the number of authors, etc.
  • 13.
    One-mode vs. Two-mode •Most social network analysis methods allow only one type of social actor (for instance, individuals or corporations) in their analysis; these are referred to as one-mode networks (Wasserman & Faust, 1994). • However, methods exist which allow two different types of social actors in their analysis; these are referred to as two- mode networks. For instance, a study may simultaneously analyze corporations and their directors. • Two-mode networks may also include social actors from distinct networks, for example, a network comprised of adults and a network comprised of children. • Two-mode networks allow for comparison between different types and sets of social actors.
  • 14.
    4. Relationships • Tiesare links that connect social actors, and are the main focus of social network analysis. Ties are seen as “channels for transfer or “flow” of resources (either material or nonmaterial)” (Wasserman & Faust, 1994, p. 4).
  • 15.
    Simple Relationships • Naturallyoccurring ties among social actors are inherently complex and consist of numerous different interaction activities. • However, unlike ethnographers network analysts do not focus on the complexity of interactions among individuals (Burt, 1983). • Instead, social network analysts focus more on the pattern of relations amongst individuals and to do so simplify the inherent complexity of social relationships by categorizing interactions into different broad types. The types can be manifold. For example, a pair of social actors may have friendship, working, cooperation, or citation ties.
  • 16.
    5. Additional Terms • Directed graphs vs. undirected graphs • Ego • Alter • Homophily
  • 17.
    Types of NetworkAnalysis • Ego-centered/Socio-centered Social Networks • Community-centered social networks
  • 18.
  • 19.
    Actor-Level Centrality • Actorlevel degree centrality: Degree centrality measures the extent to which an actor is linked to all of the other actors in the network. Three different measures can be distinguished: nodal degree, indegree, and outdegree. • Actor level closeness centrality: Closeness measures the distance that an actor has to all of the other actors in the network.
  • 20.
    • Actor levelbetweenness centrality: Betweenness measures the extent to which an actor lies between two other actors and thus facilitates/controls the flow of information.
  • 21.
    Face-to-face (1/week) CS 9 Community-Centered Social Networks
  • 22.
    Network Level Centralization •Cohesion Distance: measures the degree of separation between actors in a network. It indicates how many other people are between two actors - that is, actors between an actor and the actor this person needs to talk to. • Network Centralization: measures the number of actors that are connected to each actor in the network. The more connections among actors, the greater the network centrality. • Density: measures the degree of connection that exists in a network. The more actors talk to each other, the higher the density.
  • 23.
    Measures of Centralityand Assumptions Measure Level Data Type Symmetry/Asymmetry Nodal Degree Centrality Actor Dichotomized (>5) Symmetric (Maximum) Asymmetric Indegree Centrality Actor Valued Outdegree Centrality Actor Valued Asymmetric Closeness Centrality Actor Dichotomized (>5) Symmetric (Maximum) Betweenness Centrality Actor Dichotomized (>5) Symmetric (Maximum) Network Cohesion Network Valued Asymmetric Network Centrality Network Dichotomized (>5) Asymmetric Network Density Network Dichotomized (>5) Symmetric (Maximum)
  • 24.
    Uniqueness of Collectingand Analyzing Social Network Data • Relational data • Boundary specification and sampling • Interdependence of data points • Query search • Complexity of data collection – Manually-harvested – Data set – Behavioral – Self-report
  • 25.
    Internet Resources of Social Network Analysis • Center for the Study of Group Processes http://lime.weeg.uiowa.edu/~grpproc/ • INSNA International Network of Social Network Analysis http://www.heinz.cmu.edu/project/INSNA/ • Barry Wellman’s Homepage http://www.chass.utoronto.ca/~wellman/index.html • CulturePlex • http://cultureplex.ca/ • Gephi.org • NodeXL http://nodexl.codeplex.com/ 25
  • 27.
    Limitations of Social Network Analysis • Boundary specification • Data source • Definition of social actors • No distinct method 27
  • 28.
    What is Sylva? • A database system management system • Graph databases • NoSQL database • Built on top of Neo4J
  • 29.
    Whose Needs DoesSylva Serve? • Sylva requires no programming skills • On-the-go modification of the schema • Storing data in a graph form • Work from the nodes or from the edges • Collaborative platform • Easy-to-use interface thanks to forms, autocomplete, … • Multiple visualizations • Search and Query Engines
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    Creating a Schemaon Sylva (manually) • New Type of Node (person) • (2nd) New Type of Node (work) • Relation – Incoming or outgoing – Allowed relationships • (3rd) New Type of Node (institution)
  • 37.
    Properties of Objects •Data objects have properties • A property is an attribute that defines certain operations than can be performed on the object • We need properties to enter our data
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    Importing and Exporting •Importing a Schema • Exporting Data to Gephi
  • 48.
    Cuba’s Prominence: ModelingThe Latin American Afro in Topic Maps • Objectives: – locating the various nodes of bibliographic production associated with the generation of an image of the Latin-American Afro – evaluating the causes that make certain nodes, i.e., Cuba and various Cuban intellectuals, emerge as key nodes in the network of production of Afro- Latin American images
  • 49.
    Cuba’s Prominence • Methodology: – a combination of traditional close-reading of texts (extraction of nodes and relations) with – graph analysis of the emerging network with Page Rank algorithm
  • 51.
    Measurements (Gephi) • Closeness centrality: expresses how well connected an individual is to the whole network. A high value in this measurement indicates better connectivity and thus expresses the importance of the individual with respect to other elements in the network. • Betweenness centrality: indicates how important the individual is as a connection and transference point within the network. A high value indicates that it is a topic that is passed through in the communications (relationships) between the other topics on the map. • Modularity: is a coefficient that enables us to group together those nodes which share connections and zones on the network, so that it divides the map into zones with high relationships between them. • Influence between nodes: is an analysis which we shall carry out in the second part of the article. It is based on the Page Ranking algorithm. This is basic algorithm on which the Google search engine was originally based for calculating the importance of the pages that it comes up with after a search, and which it used to order the results. Its basic idea is that a given node within a network becomes important based on the importance of the nodes that relate with it or that point to it.
  • 52.
  • 53.
  • 54.
  • 55.
    Sustaining a GlobalCommunity • Henrich et al. [1] have proven that the existence of norms that sustain fairness in exchanges among strangers are connected with the diffusion of institutions such as market integration and the participation in world religions. • Their research confirms the hypothesis that modern world religion may have contributed to the sustainability of large- scale societies and large-scale interactions and we propose that art is another institution that contributes to the arising and sustainability of large- scale societies. • We use the case of the formation of an artistic network of paintings, schools, themes, genres, and artists whose development goes along with the expansion and colonization of the Hispanic Monarchy across America to show that this artistic network has a presence in all political territories encompassing most ethnicities and religions of indigenous origin.
  • 56.
    Methodology • The dataset comprising the paintings from the Baroque period are organized and stored in a PostgreSQL web based database. • The data includes more than 100,000 total topics (11,443 of them are artworks). A distinctive feature of the information is that it is organized around both text fields and ad-hoc descriptors that follow the model of a formal ontology. • For our study we have decided to model the data in one of the possible networks, a network created from common descriptors as weighted edges and artworks as nodes. • Some pruning methods had to be applied in order to overcome some of the shortcomings resulting from the millions of edges and the too many relational joins. We also split the dataset in 12 sections, each covering a 25 year-period, from 1550 to 1850 [4].
  • 57.
    Methodology • Similarity Measure: – S(Art1,Art2)=#{common descriptors of Art1 and Art2} Descriptor 2 Descriptor 6 Descriptor 1 S=2 Descriptor 3 Descriptor 7 Descriptor 4 Descriptor 5
  • 58.
    Research Questions • Ourresearch addresses the issue of the sustainability of communities through the existence of a flow of shared information. • This question is of the utmost importance to understand the formation and dynamics of cultural groups and cultural areas. • As important as the latter is the study of the spatial and temporal dimensions of any given political and cultural community as this will shed light on the cultural processes resulting from previous and currents waves of globalization
  • 59.
    Baroque Paintings inthe Hispanic World: A Network. • The graph shows, for the first two periods of our study, the growth of the saints-related paintings (red cluster) as compared to the decrease of the cluster with virgins (blue). Portraits’ size (brown cluster) remains more or less the same, but they get more connected to saints’. • FOTO
  • 60.
    Clustering & Visualizations:Raw Graphs 1550-1575 1575-1600 1600-1625 1625-1650 1650-1675 v 1675-1700 v v v 1700-1725 v 1725-1750 v 1750-1775 v 1775-1800 v 1800-1825 v 1825-1850 v v http://zoom.it/vJVw#full
  • 61.
    Further Work withSylva • Visualization of Schema • Two Visualizations of Data: – Node-centered – Community centered • Query System: – Pattern-matching – Traversals • Need for multi-disciplinary teams • Complexity of analysis
  • 62.
    Thank you! “With enougheffort and perseverance: Anything is possible”

Editor's Notes