• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Making the invisible visible through SNA
 

Making the invisible visible through SNA

on

  • 1,759 views

The science of networks is becoming an increasingly important and intriguing area of study that reveals many a patterns and relationships often hidden. This presentation is about the use of SNA to ...

The science of networks is becoming an increasingly important and intriguing area of study that reveals many a patterns and relationships often hidden. This presentation is about the use of SNA to study the network of the Digital Library Community

Statistics

Views

Total Views
1,759
Views on SlideShare
1,759
Embed Views
0

Actions

Likes
1
Downloads
47
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Refer to social networks here Began with Jacob Levy Moreno in the 1930s in attempt to quantify social relationships. Based on matrix algebra. Advanced statistics… Nodes can be people, departments, or organizations Networks consist of links that form a structure Links between nodes have different purposes , e.g., task or general advice, expertise, strategic information, navigating the organization (procedures, know-who, etc.) Links can be one or two directional Links can be both formal and informal Links can have different strengths
  • Kohler (1925) is about how the mind works ; stressed the organized patterns that structure thoughts and

Making the invisible visible through SNA Making the invisible visible through SNA Presentation Transcript

  • Making the invisible visible through Social Network Analysis Shalini Urs International School of Information Management University of Mysore, Mysore, India [email_address] CAMP 2010 - March 18, Shah Alam, Malaysia
  • Agenda
    • Social Network Analysis
    • Circle of Influence, Turning Points and Social Change
    • Web 2.0 and Social and professional networks
    • Science of Networks and Science Networks
    • Network Properties
    • Network dynamics of scholarship: a social network analysis of digital library community
    • Conclusion
  • Networks are the talk of town ... ... since we're living in a world of relations Networks seem to be everywhere. Games about connectivity (Six Degrees of Kevin Bacon); understanding dynamics of food networks, World Wide Web, terrorist network to structure of collaboration networks
  • The Alibaba dataset- SNA Analysis Alibaba Scn 1: discover the plot. Alibaba Scenario 1: SNA (cluster analysis) Results: Alibaba Network Discovery. Robert Savell, SBP ‘08 April 1,2008 & Process Query Systems Group
  • Flickr Network
  • Social Networks
    • A social network is a set of people or groups with some pattern of contacts or interactions between them. - (Scott 2003)
    • Network can be represented as a graph with nodes or vertices or actors denoting people or organizations.
    • These nodes are joined by lines known as edges, which denote connections between them .
  • Social network as a graph A set of actors connected by ties
    • Ties/Links
      • Emails, Publications, Chatting, or simply connected on Facebook or LinkedIn
      • Alliance, customer, investment, etc.
    Tie
    • Actors/Nodes
      • People such as Authors
      • Teams, organizations, etc.
    Actor
  • Why study social networks ?
    • The characteristics of networks and the phenomenon reveal a great deal of unseen hidden factors at a different level of granularity.
    • It is a useful tool for evaluating the extent and intensity of social relationships among individuals and organizations making up the network
  • Early history of SNA
    • Turn of the 20 th Century - George Simmel was the first scholar to think directly in social network terms. His essays pointed to the nature of network size on interaction and to the likelihood of interaction loosely-knit networks rather than groups
    • In the third decade of the twentieth century, three main traditions in social networks appeared -
      • J.L. Moreno pioneered the systematic recording and analysis of social interaction in small groups, especially classrooms and work groups (sociometry)
      • Harvard group led by W. Lloyd Warner and Elton Mayo  explored interpersonal relations at work.
      • A.R. Radcliffe-Brown's call for systematic study of networks.
  • Small World Phenomenon
    • The social psychologist, Stanley Milgram is believed to be originator of the “Small World Phenomenon” also known as “Six Degrees of Separation”.
    • The expression “ it is small world” emerged as a cliché
    • Six degrees of separation refers to the idea that, everyone is an average of six "steps" away from each person on Earth.
  • Some early empirical studies
    • A study was done by Lee (1969) called The Search for an Abortionist .
    • Abortion was illegal, and doctors who performed them could not advertise nor operate in clinics. To find them, women asked their friends and acquaintances.
    • Lee found that on average, the abortionist was 4 links away from the patient in social space. (Woman -- person -- person -- person -- Doctor).
  • Early empirical studies …
    • Mark Granovetter, wrote a book Getting a Job in 1974.based on a study in which he asked people how they got the jobs they hold.
    • Most got them through accidental contacts with others rather than purposeful search via official means (e.g. newspaper advertisements).
    • And of those who learned of the opportunities through contacts with others, few got it from family members and close friends. Most were through acquaintances.
    • This phenomenon was explained through a theory about how information diffuses through social networks. This was a seminal work in social network analysis.
  • Erdos Number
    • Paul Erdos - a famous mathematician.
    • Erdős wrote around 1,500 mathematical articles, mostly co-written.
    • He had 511 direct collaborators. Hence Erdos has number 0. Any author who has directly collaborated with Erdos gets number 1.
  • What does SNA do ?
    • It is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities .
    • The nodes in the network are entities such as people and groups while the links show relationships or flows between the nodes.
    • SNA provides both a visual and a mathematical analysis of human relationships
  • SNA– the discipline
    • Emergence of a new field-SNA
    • SNA has been around for decades, though as a discipline of study or a science by itself, it is a recent one
    • SNA - an approach and a tool to uncover and understand the hidden side of connections that drive certain phenomenon involving a network of human players.
    • Has gained currency and popularity as an effective tool to discover those invisible paths or lines that show the ties or links between people, organizations and phenomena
  • SNA – the academic lineage
    • Psychology - the gestalt tradition - Three scientists working in the Gestalt tradition fled to US - Kurt Lewin, Jacob Moreno, and Fritz Heider
    • Anthropology – kinships and social relations is one of the main focus of anthropologists and their deep understanding has contributed to a deeper and fundamental nature of social networks.
    • Mathematics – understanding the mathematical properties of the network – graph theory
  • Psychology and shaping of SNA
    • Moreno developed sociometry. He started asking people who their friends were and explored the ways in which their relations with others served as both limitations and opportunities for action and for their psychological behavior. He believed that large scale social phenomena, such as the economy and state, were sustained and reproduced over time by the small scale configurations formed by peoples patterns of friendship, dislike and other relations
    • Heider worked in the area of social perception and attitudes. He developed what is known as balance theory.
    • Lewin argued that the structural properties of this social space could be investigated mathematically using vector theory and topology.
  • Anthropology and SNA
    • One of the biggest emphases in social anthropology has been on social relations.
    • According to Radcliffe-Brown and Nadel, social structure was based on concrete relations among individuals. They wrote at a theoretical level about the web of relations comprising society
    • It had long been understood that in pre-industrial societies kinship relations were extraordinarily complex and important. But other relations, such as friendships, were equally important in industrialized societies..
  • Mathematics, physics and others
    • Euler (1736) settled the Konigsberg Bridge problem, by translating into a mathematical notation involving points and lines, and then deriving some proofs.
    • This idea was rediscovered many times in different areas of math and applied sciences.
    • In statistics they developed the notion of Markov probability chains.
    • In physics they were used to understand molecules adjacent to each other in Euclidean space.
    • In operations research, graphs were used to map out the location of goods and channels of transmission.
  • Graph Theory and SNA
    • Graph theory is a well-developed area at the intersection of combinatorics and topology
    • The fundamental concept is the graph, which is best thought of as a mathematical object rather than a diagram, even though graphs have a very natural graphical representation.
    • A graph – usually denoted G(V,E) or G = (V,E) – consists of set of vertices V together with a set of edges E.
    • Vertices are also known as nodes, points and (in social networks) as actors, agents or players.
    • Edges are also known as lines and as ties or links.
    • An edge e = (u,v) is defined by the unordered pair of vertices that serve as its end points.
  • Graph Theory and SNA The above graph has- vertex set V={a,b,c,d,e.f} and edge set E = {(a,b),(b,c),(c,d),(c,e),(d,e),(e,f)}. Two vertices u and v are adjacent if there exists an edge ( u,v ) that connects them. An edge e = (u,u) that links a vertex to itself is known as a self-loop or reflexive tie. The number of vertices in a graph is usually denoted n while the number of edges is usually denoted m
  • SNA Applications
    • Sociology – crime detection , formation of gangs etc
    • Epidemiology – spread of diseases
    • Economics – social capital , framework for understanding international trade etc.
    • Management – how certain organizational silos are silos and connected. Change management, how an organization is mapped out etc.
    • Information Sciences – Scientific Collaborations, trend analysis …
  • SNA …
    • Used to study as diverse phenomenon as –
    • Correlating performance and creativity
    • Predicting who will be the next US president
    • Seat of political power within an organization or society
    • Spread of viruses , of terrorism and innovation
    • Social Capital and organizational performance
  • Turning Points and Social Change By measuring the patterns of interaction and communication among members of a network, one can uncover the origins of ideas and social change. There is increasing interest in linking the distribution of cultural ideas and practices to social communities Studies have been carried out to assess the relationship between network dynamics and community “readiness” to engage in the social change processes.
  • Case studies
    • “ Communities That Care program – preventive health ( Feinberg et al, 2005)
    • The network approach and interventions to prevent HIV among injection drug users (Neaigus, 1998)
    • Understanding the structure of a large heroin distribution network (Natarajan, 2006).
    • Low degree metabolites explain essential reactions and enhance modularity in biological networks Samal et al, 2006).
    • Thick Networks and their impact on Japanese Environmental Protest (Broadbent,2006)
    • Local processes of national corruption: Elite linkages and their effects on poor people in India (Pellissery, 2007).
  • Case studies
    • Centers for Disease Control and Prevention used Social Networks to Identify Persons with Undiagnosed HIV Infection --- Seven U.S. Cities, October 2003--September 2004. 
    • Koehly and his colleagues (2003) made an analysis of communication and family functioning patterns of Hereditary Nonpolyposis Colorectal Cancer families. 
    • Valdis Krebs(2002) created Terrorist network map of the 19 9/11 hijackers and their associates. 
    • IBM Institute for Knowledge based organizations used social network analysis executives in the exploration and production division of a large petroleum organization to identify highly peripheral people who represent untapped expertise for Knowledge creation and sharing.
  • Valdis Krebs(2002)-Terrorist network map
  • Circle of Influence
    • Martin argues that while predicting the specific content of ideas is often not possible, we can link the shape of an idea space to the structure of a network.
    • As Gladwell in his book Tipping Point very tellingly narrates, the “connectors”, the “mavens”, and the “salesmen” of a society form potent drivers influencing changes in societies
  • Web 2.0 and Social and professional networks
    • In the first phase, web connected computers and networks, and let one sided or unidirectional information flow.
    • Web 2.0 went one step further—it is a coming together of the Internet and the social networks that has linked humans throughout civilizations.
    • It is connecting people and information flow is bidirectional and it is enabling one to one, one to many and many to many interactions and information flows.
  • Information flows in information spaces
    • Examine the broader processes at work and look beyond the web 2.0 sites.
    • People move through online spaces to form connections with others, build virtual communities, and engage in some form of self-expression.
    • Facebook's 350 million and more users spend 8 billion minutes on Facebook, post 45 million status updates
    • Even as these new social networking sites have led to changes in our ways of communication, they have also remained unchanged in terms the principles of human social interaction—principles that can now be observed and aggregated at unprecedented levels of scale and granularity through the data being generated by these online worlds.
  • Understanding information flows
    • Like time-lapse video or photographs through a microscope, these images of social networks offer glimpses of everyday life from an unconventional vantage point—images depicting phenomena such as the flow of information through an organization or the disintegration of a social group into rival factions.
    • Science advances whenever we can take something that was once invisible and make it visible; and this is now taking place with regard to social networks and social processes
  • Scholarly communities and networks
    • Numerous academic-focused social networking sites such as www.academia.edu and reference management systems like www.connotea.org have emerged.
    • There are more content-focused social network sites, such as www.mendeley.com, which allows users to upload and share their research papers, and www.myexperiment.org , which offers the opportunity to share workflows and methodologies.
  • Scholarship and Publishing
    • Some of these sites are new commercial ventures, and some have emerged from an academic background and some have the backing of the big scientific publishers.
    • Most noticeable is the presence of the scientific publishers amongst the sites designed for managing scholarly references online: www.citeulike.org is sponsored by Springer, www.2collab.com is owned by Elsevier, and www.connotea.org is part of the Nature Publishing Group.
    • Nature Publishing Group also has the Nature Network, a social network site focused on scientific discussions in groups and forums.
  • Data deluge and end of scientific theory
    • At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics.
    • Calls for an entirely different approach requiring us to lose the tether of data as something that can be visualized in its totality.
    • It forces us to view data mathematically first and establish a context for it later
    • This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear.
    • Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.
    • (Chris Anderson, 2008)
  • Science of Networks and Network of Science
    • A new science of networks has emerged in the recent past
    • It is an intellectually intriguing academic pursuit with interesting possibilities
    • Disciplines such as sociology, applied mathematics, physics, and computer science have contributed tremendously to the science of networks
    • SNA tools have also enhanced their value
  • The Network of Science
    • The collection and connections of scientists and scientific organizations.
    • To understand the dynamics of scientific research it is important to examine the formal and informal communication networks of researchers.
    • As Kuhn notes in Structure of scientific revolutions, a paradigm transforms a group into a profession or, at least, a discipline and the paradigm guides the whole group’s research, and it is this criterion that most clearly proclaims a field as science.
    • Today, the use of SNA for the study of academic networks has brought in a fresh dimension to the exploration of collaborations, research trends and paradigms, and behavior of the academic community and its impact.
    • Scholarship - acts of mind or spirit that have been made public in some manner, have been subjected to peer review by members of one's intellectual or professional community, and can be cited, refuted, built upon, and shared among members of that community.
    • Scholarship properly communicated and critiqued serves as the building block for knowledge growth in a field. Collaboration, across time and space, is the fundamental method of scholarship.
    Academic Scholarship
  • Network Properties
    • SNA generates several kinds of network properties such as –
    • centrality
    • components
    • cohesion
    • density
    • geodesic
    • ego networks
    • ‘small world phenomenon’
  • Centrality
    • Relates to the position of a particular node within the network.
    • Local centrality is concerned with the number of direct relationships a particular node has with all the other nodes.
    • A high number indicates nodes with a high level of local centrality.
    • Centrality includes three other measures – degree centrality; betweenness centrality; and the closeness centrality.
    • The degree centrality is the sum of all other actors who are directly connected to the actor.
    • The betweenness centrality is the extent to which an actor is directly connected only to those other actors that are not directly connected to each other; an intermediary; liaisons; bridges.
    • The Closeness Centrality tells how close an actor is on an average to all other actors
  • Components
    • Components of a graph are sub-graphs that are connected within, but disconnected between sub-graphs
    • Authors who don’t collaborate with others are known as “isolates.”
    • More interesting components are those which divide the network into separate parts, and where each part has several actors who are connected to one another.
    • Cohesion is the degree to which actors are connected directly to each other by cohesive bonds.
    • An actor is “reachable” by another if there exists, a set of connections by which we can trace from the source to the target actor, regardless of how many others fall between them.
    • Density is a measure of the level of connectivity within the network. It reflects the actual number of links as a proportion of total possible links.
  • Geodesic
    • In co-authorship networks, two authors know each other through collaboration.
    • For both directed and undirected data, the geodesic distance is the number of relations in the shortest possible walk from one actor to another.
    • The geodesic distance is widely used in network analysis.
    • There may be many connections between two authors in a network.
    • The geodesic path (s) is often the "optimal" or "shortest" connection between two authors
  • Ego Networks
    • ‘Ego’ is an individual focal node.
    • Looking at "ego" and the "ego network”, we can get a sense of the structural constraints and opportunities that an author faces, and the role that an actor plays in a social structure.
    • Egocentric methods really focus on the individual, rather than on the network as a whole. Such information is useful for understanding how an individual can influence the network
  • SNA Tools
    • There are many Social network analysis tools available, which help in analysis as well as visualization.
    • Agna (http://agna.gq.nu)
    • Ucinet (http://www.analytictech.com/)
    • Krackplot (http://www.contrib.andrew.cmu.edu/~krack/)
    • Anthropac (http://www.analytictech.com/)
    • Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek / )
    • NetDraw
    • Visione
    • Netminer
  • Science of networks – challenges
    • Understanding disparate social network systems and identifying common static topological properties and dynamic properties during the formation and evolution of these social networks, and how contextual information can help in analyzing the pertaining social networks
    • The issues that have important implications are -community discovery, anomaly detection, and trend prediction.
    • Identification of static and dynamic properties can enhance applications in multiple domains such as information retrieval, recommendation systems, and security and others.
  • Dynamic Network Analysis ( DNA)
    • DNA is an emergent scientific field that brings together traditional social network analysis, link analysis (LA) and multi-agent systems (MAS) within network science and network theory.
    • The statistical analysis of DNA data and the utilization of simulation to address issues of network dynamics are the two major aspects of DNA.
    • What differentiates dynamic networks is that they are larger, dynamic, multi-mode, multi-plex networks, and may contain varying levels of uncertainty.
    • DNA, like quantum mechanics, is a theory in which relations are probabilistic, acts of measurement change the network, movement in one part of a network propagates throughout the entire system, and so on.
    • DNA nodes can learn, unlike quantum mechanical atoms.
  • DNA – Challenges
    • The main issue to contend is DNA requires extensive computational resources and many simulation models are built for a single purpose and cannot be reused, quickly making them obsolete.
    • Interpreting and moving this level of information into the practical realm and scale is not yet a functional reality
  • Author Collaboration and SNA
    • Invisible Colleges—a term coined by Price in 1961 to describe informal networks of scientific specialists—is noted to influence the growth of specialties.
    • Crane’s work linked the rapid development of new ideas to the social structure of the small “invisible colleges”
  • Co-authorship network - Definition
    • Collaboration is a critical aspect of Scientific community.
    • Co authorship network is the one in which scientists or authors have together written a paper.
    • In this kind of co authorship network, authors are the nodes and their relationships with other authors in terms of co authorship are edge or lines.
    • Barabassi et.al
  • Our study
    • Network Characteristics of digital library community
    • Various kinds of networks such as collaboration networks, editorial board networks, affiliation network, co-citation networks etc are studied to get an insight of the dynamics of the DL community.
  • Our data sets
    • SNA research is data centric, analytics oriented and visualization based.
    • The datasets for our research are from the following: prestigious conferences and journals in the domain of DL
    • Online database of DL literature
    • Groups such as Social Networking groups.
    • The conferences - JCDL (Joint conference on Digital Libraries); ECDL (European Conference on Digital Libraries); ICADL (International Conference on Asian Digital Libraries); DL (IEEE Advances in Digital Libraries); ADL (ACM Conference on Digital Libraries).
    • Journals that are listed in ISI (Thomson Scientific) Web of Knowledge
  • Co-authorship network of authors in the field of digital library
    • Sharma, Monica and Urs, Shalini R: Small World, Author collaboration: How Small Connected is Digital Library World?, ICADL 2007
    SlNo. Author Acronym WoS (RC) DBLP (RC) DBLP (CC) 1 Edward A. Fox EF 25 195 261 2 Ian H. Witten IW 15 157 97 3 Hsinchun Chen HC 13 220 209 4 David Bainbridge DB 9 50 52 5 Norbert Fuhr NF 9 119 121 6 Kurt Maly KM 9 100 134 7 Mohammad Zubair MZ 9 55 81 8 Nabil R. Adam NA 8 63 62 9 Marcos Andr é Gon ç alves MA 8 55 70 10 Ee-Peng Lim EL 8 160 123 11 Michael L. Nelson ML 8 61 80   Ranked 1st  
  • Coauthorship network between 11 top authors with coauthors
  • Coauthorship network between 11 top authors
  • Components of Digital Library community
  • Degree centrality of top 11 authors
  • Betweenness centrality of top11 authors
  • Some Findings
    • Our results show that Edward A Fox and Hsinchun Chen play a very important role in the network as they hold the most powerful and central location in the network.
    • The average shortest path between authors is 3.5 clearly indicating Digital Library forms a ‘Small world’.
  • Co-authorship network of D-Lib and JODL research community
  • DL Network Dynamics
    • This study also showed Edward A. Fox as one who plays a very important role as also shown in our earlier study.
    • We also found that the network is divided into many components, the Giant component comprises of the majority of nodes or the authors.
    • The average geodesic distance between any pair, is 6.1, which suggests that six degrees of separation theory holds true for Digital Library world.
  • DL network dynamics …
    • Betweenness centrality is regarded as a measure of the extent to which a node has control over information flowing between others.
    • The node or author with highest “Betweenness” acts as a gatekeeper controlling the flow of resources between the nodes, which it connects.
    • Edward A. Fox dominates the network in terms of betweenness centrality as well.
    • Edward A. Fox, Terence R. Smith, Carl Lagoze and Hsinchun Chen are among the top 15 authors in DL.
  • Editorial Board Network of DL Journals – the circle of influence
    • We tried to unravel the structure of DL community by studying the diversified convergence of DL domain, based on the editor network.
    • Premised on the fact that boundaries and directions of a field, especially an evolving one such as digital libraries is shaped by journals and their editors, editor network of top fifty six journals in the field of digital libraries was studied
    • These fifty six journals were identified based on publication count as per the Thomson Reuters Web of Science.
  • Network of Editors and the its components
  • DL – Circle of Influence
    • Results show that computer science is the common thread, and the library and information science is the field that dominates both in the category of top ranking journals and also the editors.
    • It also shows that the network is highly connected with giant component comprising of 84.1 percent of editors
    • Geographical distribution of these editors confirms the dominance of USA.
  • DL Network :CiteSeer Community
    • CiteSeer has emerged as a web-based scientific literature digital library and search engine that focuses primarily on the literature in the fields of computer and information science.
    • Hence it is expected to present a fairly comprehensive and huge collection of literature on Digital Libraries.
    • The CiteSeer was chosen as a dataset for this study.
    • Top six authors were identified based on the record count in CiteSeer and their co-authorship network of data taken from DBLP database was created
  • CiteSeer Community
  • CiteSeer - results
    • DL community of CiteSeer is a fragmented world forming number of components.
    • CiteSeer is a heterogeneous community of authors whose expertise is diverse and inter-disciplinary.
    • The difference in the present work with other earlier studies is due to the fact dataset was that of co- authorship network of peer reviewed conferences and journals dedicated to a specific domain, thus formed a dense, homogenous and a small community.
  • Conclusions
    • SNA holds immense potential to unravel the mysteries of connections and patterns of entities and interactions between entities – whether authors or terrorists.
    • Just as spread of diseases, spread of ideas also seems to follow some kind of circle of influences through a social network of ego networks and connectors.
    • The academic community of the domain of digital libraries based on different types of datasets, reveals that the fabled “six degrees of separation’ holds true in this case as well.
    • Our study also evidences that conferences and specialized or domain specific journals not only represent the “invisible college” of any discipline and some key individuals form the egos and ego networks perhaps constituting the circle of influences
  • THANK YOU Thank you