Sylva workshop.gt that camp.2012Presentation Transcript
Social Network Analysis with Sylva Social Network Analysis with Sylva Juan Luis Suárez & Anabel Quan-Haase Western University
Overview of Workshop• General overview of the social network approach• Key terminology• Uniqueness of collecting and analyzing social network data• Entering data into Sylva• Importing/exporting data into Sylva• Example I:• Example II:• Understanding limitations and problems• Future Work and Gephi.org
What is SNA?Social network analysis is focused on uncoveringthe patterning of people’s interaction.…Networkanalysts believe that how an individual livesdepends in large part on how that individual istied into the larger web of social connections.Many believe, moreover, that the success orfailure of societies and organizations oftendepends on the patterning of their internalstructure (Freeman, 1998, November 11).
What is Unique about SNA?Social science research and theory tends tofocus on social actors’: •attributes •attitudes •opinions •behavior Focus is on individual level of analysis, less on network-structural level.
a whole is not simply the sum of its parts
Key Terminology• 1. Social structure• 2. Social network• 3. Nodes• 4. Linkages/relations• 5. Additional terms of relevance: – Nodes & edges – Directed graphs vs. undirected graphs – Ego – Alter – Homophily
1. Social Structure• Sociological inquiry consists of understanding the constraining influence of social structure on social action• BUT; how do we study social structure? Attributes Networks
2. Social Network Social Actors TiesFigure 2: Social Structure as Social Network
3. Nodes• The actors considered in a social network are exclusively social (alternatively referred to as agents, nodes, or social entities).• These include individuals, organizations, institutions, nations, or groups (Wasserman & Faust, 1994).
Blurred Nodes• Social actors can therefore be distinguished from non-social actors – e.g., neurons comprising a neural network.• On occasion, the distinction between a social and a non-social actor is not absolute. For example, computer networks represent a hybrid type of network.
Node Attributes• Every single node can have one or more attributes.• These attributes describe the nodes and allow researchers to conduct complex queries of the database.• Node attributes can include the time of publication of a book, its length, the number of authors, etc.
One-mode vs. Two-mode• Most social network analysis methods allow only one type of social actor (for instance, individuals or corporations) in their analysis; these are referred to as one-mode networks (Wasserman & Faust, 1994).• However, methods exist which allow two different types of social actors in their analysis; these are referred to as two- mode networks. For instance, a study may simultaneously analyze corporations and their directors.• Two-mode networks may also include social actors from distinct networks, for example, a network comprised of adults and a network comprised of children.• Two-mode networks allow for comparison between different types and sets of social actors.
4. Relationships• Ties are links that connect social actors, and are the main focus of social network analysis. Ties are seen as “channels for transfer or “flow” of resources (either material or nonmaterial)” (Wasserman & Faust, 1994, p. 4).
Simple Relationships• Naturally occurring ties among social actors are inherently complex and consist of numerous different interaction activities.• However, unlike ethnographers network analysts do not focus on the complexity of interactions among individuals (Burt, 1983).• Instead, social network analysts focus more on the pattern of relations amongst individuals and to do so simplify the inherent complexity of social relationships by categorizing interactions into different broad types. The types can be manifold. For example, a pair of social actors may have friendship, working, cooperation, or citation ties.
Types of Network Analysis• Ego-centered/Socio-centered Social Networks• Community-centered social networks
Ego-centered/Socio-centered Social Networks
Actor-Level Centrality• Actor level degree centrality: Degree centrality measures the extent to which an actor is linked to all of the other actors in the network. Three different measures can be distinguished: nodal degree, indegree, and outdegree.• Actor level closeness centrality: Closeness measures the distance that an actor has to all of the other actors in the network.
• Actor level betweenness centrality: Betweenness measures the extent to which an actor lies between two other actors and thus facilitates/controls the flow of information.
Face-to-face (1/week) CS 9Community-Centered Social Networks
Network Level Centralization• Cohesion Distance: measures the degree of separation between actors in a network. It indicates how many other people are between two actors - that is, actors between an actor and the actor this person needs to talk to.• Network Centralization: measures the number of actors that are connected to each actor in the network. The more connections among actors, the greater the network centrality.• Density: measures the degree of connection that exists in a network. The more actors talk to each other, the higher the density.
Measures of Centrality and AssumptionsMeasure Level Data Type Symmetry/AsymmetryNodal Degree Centrality Actor Dichotomized (>5) Symmetric (Maximum) AsymmetricIndegree Centrality Actor ValuedOutdegree Centrality Actor Valued AsymmetricCloseness Centrality Actor Dichotomized (>5) Symmetric (Maximum)Betweenness Centrality Actor Dichotomized (>5) Symmetric (Maximum)Network Cohesion Network Valued AsymmetricNetwork Centrality Network Dichotomized (>5) AsymmetricNetwork Density Network Dichotomized (>5) Symmetric (Maximum)
Uniqueness of Collecting and Analyzing Social Network Data• Relational data• Boundary specification and sampling• Interdependence of data points• Query search• Complexity of data collection – Manually-harvested – Data set – Behavioral – Self-report
Internet Resources of Social Network Analysis• Center for the Study of Group Processes http://lime.weeg.uiowa.edu/~grpproc/• INSNA International Network of Social Network Analysis http://www.heinz.cmu.edu/project/INSNA/• Barry Wellman’s Homepage http://www.chass.utoronto.ca/~wellman/index.html• CulturePlex• http://cultureplex.ca/• Gephi.org• NodeXL http://nodexl.codeplex.com/ 25
Limitations of Social Network Analysis• Boundary specification• Data source• Definition of social actors• No distinct method 27
What is Sylva?• A database system management system• Graph databases• NoSQL database• Built on top of Neo4J
Whose Needs Does Sylva Serve?• Sylva requires no programming skills• On-the-go modification of the schema• Storing data in a graph form• Work from the nodes or from the edges• Collaborative platform• Easy-to-use interface thanks to forms, autocomplete, …• Multiple visualizations• Search and Query Engines
Creating a Database (Graph)
Schema vs Data
My First Schema
Creating a Schema on Sylva (manually)• New Type of Node (person)• (2nd) New Type of Node (work)• Relation – Incoming or outgoing – Allowed relationships• (3rd) New Type of Node (institution)
Properties of Objects• Data objects have properties• A property is an attribute that defines certain operations than can be performed on the object• We need properties to enter our data
Properties of “Person”
Properties of “Person”
Entering Data (manually)
My First Graph
The Node Level:Selecting and Expanding
Collaboration in Sylva
Case of Collaboration
Searching• Returns a list
Importing and Exporting• Importing a Schema• Exporting Data to Gephi
Cuba’s Prominence: Modeling The Latin American Afro in Topic Maps• Objectives: – locating the various nodes of bibliographic production associated with the generation of an image of the Latin-American Afro – evaluating the causes that make certain nodes, i.e., Cuba and various Cuban intellectuals, emerge as key nodes in the network of production of Afro- Latin American images
Cuba’s Prominence• Methodology: – a combination of traditional close-reading of texts (extraction of nodes and relations) with – graph analysis of the emerging network with Page Rank algorithm
Measurements (Gephi)• Closeness centrality: expresses how well connected an individual is to the whole network. A high value in this measurement indicates better connectivity and thus expresses the importance of the individual with respect to other elements in the network.• Betweenness centrality: indicates how important the individual is as a connection and transference point within the network. A high value indicates that it is a topic that is passed through in the communications (relationships) between the other topics on the map.• Modularity: is a coefficient that enables us to group together those nodes which share connections and zones on the network, so that it divides the map into zones with high relationships between them.• Influence between nodes: is an analysis which we shall carry out in the second part of the article. It is based on the Page Ranking algorithm. This is basic algorithm on which the Google search engine was originally based for calculating the importance of the pages that it comes up with after a search, and which it used to order the results. Its basic idea is that a given node within a network becomes important based on the importance of the nodes that relate with it or that point to it.
Some numerical results
Sustaining a Global Community• Henrich et al.  have proven that the existence of norms that sustain fairness in exchanges among strangers are connected with the diffusion of institutions such as market integration and the participation in world religions.• Their research confirms the hypothesis that modern world religion may have contributed to the sustainability of large- scale societies and large-scale interactions and we propose that art is another institution that contributes to the arising and sustainability of large- scale societies.• We use the case of the formation of an artistic network of paintings, schools, themes, genres, and artists whose development goes along with the expansion and colonization of the Hispanic Monarchy across America to show that this artistic network has a presence in all political territories encompassing most ethnicities and religions of indigenous origin.
Methodology• The data set comprising the paintings from the Baroque period are organized and stored in a PostgreSQL web based database.• The data includes more than 100,000 total topics (11,443 of them are artworks). A distinctive feature of the information is that it is organized around both text fields and ad-hoc descriptors that follow the model of a formal ontology.• For our study we have decided to model the data in one of the possible networks, a network created from common descriptors as weighted edges and artworks as nodes.• Some pruning methods had to be applied in order to overcome some of the shortcomings resulting from the millions of edges and the too many relational joins. We also split the dataset in 12 sections, each covering a 25 year-period, from 1550 to 1850 .
Research Questions• Our research addresses the issue of the sustainability of communities through the existence of a flow of shared information.• This question is of the utmost importance to understand the formation and dynamics of cultural groups and cultural areas.• As important as the latter is the study of the spatial and temporal dimensions of any given political and cultural community as this will shed light on the cultural processes resulting from previous and currents waves of globalization
Baroque Paintings in the Hispanic World: A Network.• The graph shows, for the first two periods of our study, the growth of the saints-related paintings (red cluster) as compared to the decrease of the cluster with virgins (blue). Portraits’ size (brown cluster) remains more or less the same, but they get more connected to saints’.• FOTO
Clustering & Visualizations: Raw Graphs1550-1575 1575-1600 1600-1625 1625-1650 1650-1675 v 1675-1700 v v v1700-1725 v 1725-1750 v 1750-1775 v 1775-1800 v 1800-1825 v 1825-1850 v v http://zoom.it/vJVw#full
Further Work with Sylva• Visualization of Schema• Two Visualizations of Data: – Node-centered – Community centered• Query System: – Pattern-matching – Traversals• Need for multi-disciplinary teams• Complexity of analysis
Thank you!“With enough effort and perseverance: Anything is possible”